Project Examples for Deep Structured Learning (Spring 2023)

We suggest below some project ideas. Feel free to use this as inspiration for your project. Talk to us for more details.

Multimodal Deep Learning

Type: Survey.
Problem: Many tasks require combining different modalities, such as language and vision, language and speech, etc. This project will survey some of these approaches.
References:

Global Workspace Theory

Problem: Current deep learning models lack higher order cognition capabilities. Global Workspace Theory is a cognitive science theory of consciousness that serves as inspiration to endow higher order cognition capabilities to neural networks.
References:
1. Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua Bengio. "Coordination Among Neural Modules Through a Shared Global Workspace."
2. Anirudh Goyal, Yoshua Bengio. "Inductive Biases for Deep Learning of Higher-Level Cognition."

Reinforcement Learning with Human Feedback / Constitutional AI

Type: Practical project or survey.
Problem: Current large language models such as ChatGPT include a final step of reinforcement learning with human feedback, where human preferences are taken into account to produce a reward model which is then used to fine-tune a pretrained model.
References:
1. Stiennon et al. "Learning to summarize from human feedback". NeurIPS 2020.
2. Bai et al. "Constitutional AI: Harmlessness from AI Feedback". 2022.

Reinforcement Learning for Text Generation

Type: Practical project or survey.
Problem: Current models for text generation trained with maximum likelihood estimation often suffer from exposure bias. New metrics for text generation (such as COMET, BLEURT, BARTSCORE) offer new strategies to train systems by maximizing a better reward function, using reinformement learning techniques.
References:

GFlowNets

Type: Survey.
Problem: Survey Generative Flow Networks (GFlowNets), a recent method that allows samples a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.
References:
1. Yoshua Bengio, Tristan Deleu, Edward J. Hu, Salem Lahlou, Mo Tiwari, Emmanuel Bengio. "GFlowNet Foundations."
2. Yoshua Bengio. Generative Flow Networks.

Memory consolidation and continual learning

Type: Survey.
Problem: Current deep learning models store their long-term memory in the model parameters or learn new tasks on the fly through in-context learning, but they lack a middle ground so that they can assimilate new information on the fly based on their interactions and experience. The ability to learn on a continual basis is highly desirable in many applications, mimicking what humans do via memory consolidation. This project will survey existing work in this area, potentially exploring connections with neuroscience.
References:
1. Lange et al. "A continual learning survey: Defying forgetting in classification tasks" TPAMI 2021.
2. Biesialska et al. "Continual Lifelong Learning in Natural Language Processing: A Survey", COLING 2020.

Prompting / Adaptors / Retrieval for NLP Tasks

Type: Practical project.
Problem: Large pretrained language models can be expensive to fine-tune. Lightweight strategies include adaptors and prompting methods, as well as retrieval-based techniques. The goal is to experiment with (or combine) some of these techniques. For example, can we use retrieval (with a similarity search engine like FAISS) to generate good examples in a few-shot learning setting?
Data: See papers.
References:

Uncertainty Quantification / Conformal Prediction

Type: Practical project.
Problem: How to estimate the uncertainty of a structured classifier or regressor?
Method: Monte Carlo dropout, deep ensembles, heteroscedastic regression, direct epistemic uncertainty prediction, etc.
References:

Sub-quadratic Sequence Models with S4

Type: Practical project.
Problem: Transformers and BERT models are extremely large and expensive to train and keep in memory. The goal of this project is to survey or to make Transformers more efficient in terms of time and memory complexity by reducing the quadratic cost of self-attention or by inducing a sparser and smaller model. One possibility is to experiment with the recently proposed S4 model [1,2].
Method: See references below.
Data: Wikitext, Long Range Arena, etc. See in the original Transformer paper and references below.
References:
1. Albert Gu, Karan Goel, Christopher Ré. "Efficiently Modeling Long Sequences with Structured State Spaces." ICLR 2022
2. Sasha Rush. "The Annotated S4."
3. Tay, Yi, Mostafa Dehghani, Dara Bahri, and Donald Metzler. "Efficient Transformers: A Survey." ArXiv 2020 (e.g. Transformer-XL, Reformer, Linformer, Linear transformer, Compressive Transformer, etc.)
4. Gonçalo M. Correia, Vlad Niculae, André F.T. Martins. Adaptively Sparse Transformers. EMNLP 2019.
5. Choromanski et al. "Rethinking Attention with Performers". ICLR 2021

Constrained Structured Classification with AD3

Problem: Use AD3 and dual decomposition techniques to impose logic/budget/structured constraints in structured problems. Possible tasks could involve generating diverse output, forbidding certain configurations, etc.
Data:
- "weasel words": detecting hedges/uncertainty in writing in order to improve clarity
- Coreference in quizbowl
Evaluation: task-dependent
References:

Energy-Based Models / Diffusion Models

Problem: Energy networks can be used for density estimation (estimating p(x)) and structured prediction (estimating p(y|x)) when y is structured. Both cases pose challenges due to intractability of computing the partition function and sampling. In structured output prediction with energy networks, the idea is to replace discrete structured inference with continuous optimization in a neural net. This project can be either a survey about recent work in this area or it can explore some practical applications. Applications: multi-label classification and sequence tagging.
Method: Learn a neural network E(x; w) to model the energy of x or E(x, y; w) to model the energy of an output configuration y (relaxed to be a continuous variable). Inference becomes min_y E(x, y; w). How far can this relaxation take us? Can it be better/faster than global combinatorial optimization approaches?
Data: MNIST, multi-label classification, sequence tagging.
References:

Causal Representation Learning / Causal Structure Models

Type: Survey or practical project.
Problem: Causal inference and discovery is a area of growing interest in machine learning and statistics, with numerous applications and connections to confounding removal, reinforcement learning, and disentanglement of factors of variation. This project can be either a survey about the area or it can explore some practical applications.
Method: Plenty to choose from!
Data: See the references below.
References:

Associative Memory with Modern Hopfield networks

Type: Survey or practical project.
Project: Hopfield networks [1] are recurrent neural networks with dynamical trajectories converging to fixed point attractor states dictated by the minimization of an energy function. These networks can be regarded as models of associative memory. Recently, a modern family of Hopfield networks has been proposed and studied with very interesting properties.
References:
Hopfield, John (1982). "Neural networks and physical systems with emergent collective computational abilities"
Ramsauer, Hubert; et al. (2021). "Hopfield Networks is All You Need". ICLR 2020.

Emergent Communication

Problem: Agents need to communicate to solve problems that require collaboration. The goal of this project is to apply techniques (for example using sparsemax or reinforcement learning) to induce communication among agents.
Data: See references below.
Evaluation: See references below.
References: