Project Examples for Deep Structured Learning (Spring 2022)

We suggest below some project ideas. Feel free to use this as inspiration for your project. Talk to us for more details.

Multimodal Deep Learning

Problem: Many tasks require combining different modalities, such as language and vision, language and speech, etc. This project will survey some of these approaches.
References:
1. Visual Question Answering
2. S. Frank, E. Bugliarello, and D. Elliott. "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers." EMNLP 2021.

Problem: Current models for text generation trained with maximum likelihood estimation often suffer from exposure bias. New metrics for text generation (such as COMET, BLEURT, BARTSCORE) offer new strategies to train systems by maximizing a better reward function, using reinformement learning techniques.
References:

Problem: Current deep learning models lack higher order cognition capabilities. Global Workspace Theory is a cognitive science theory of consciousness that serves as inspiration to endow higher order cognition capabilities to neural networks.
References:
1. Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua Bengio. "Coordination Among Neural Modules Through a Shared Global Workspace."
2. Anirudh Goyal, Yoshua Bengio. "Inductive Biases for Deep Learning of Higher-Level Cognition."

Problem: Survey Generative Flow Networks (GFlowNets), a recent method that allows samples a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function.
References:
1. Yoshua Bengio, Tristan Deleu, Edward J. Hu, Salem Lahlou, Mo Tiwari, Emmanuel Bengio. "GFlowNet Foundations."
2. Yoshua Bengio. Generative Flow Networks.

Problem: Large pretrained language models can be expensive to fine-tune. Lightweight strategies include adaptors and prompting methods. The goal is to survey and potentially experiment with some of these techniques.
Data: See papers.
References:
1. Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, Graham Neubig. "Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing" and references therein.

Problem: How to estimate the uncertainty of a classifier or regressor?
Method: Monte Carlo dropout, deep ensembles, heteroscedastic regression, direct epistemic uncertainty prediction, etc.
References:
1. Moksh Jain, Salem Lahlou, Hadi Nekoei, Victor Butoi, Paul Bertin, Jarrid Rector-Brooks, Maksym Korablyov, Yoshua Bengio. "DEUP: Direct Epistemic Uncertainty Prediction."
2. Alex Kendall, Yarin Gal. "What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?" NeurIPS 2017.

Problem: Transformers and BERT models are extremely large and expensive to train and keep in memory. The goal of this project is to survey or to make Transformers more efficient in terms of time and memory complexity by reducing the quadratic cost of self-attention or by inducing a sparser and smaller model. One possibility is to experiment with the recently proposed S4 model [1,2].
Method: See references below.
Data: Wikitext, Long Range Arena, etc. See in the original Transformer paper and references below.
References:
1. Albert Gu, Karan Goel, Christopher Ré. "Efficiently Modeling Long Sequences with Structured State Spaces." ICLR 2022
2. Sasha Rush. "The Annotated S4."
3. Tay, Yi, Mostafa Dehghani, Dara Bahri, and Donald Metzler. "Efficient Transformers: A Survey." ArXiv 2020 (e.g. Transformer-XL, Reformer, Linformer, Linear transformer, Compressive Transformer, etc.)
4. Gonçalo M. Correia, Vlad Niculae, André F.T. Martins. Adaptively Sparse Transformers. EMNLP 2019.
5. Choromanski et al. "Rethinking Attention with Performers". ICLR 2021

Problem: Use AD3 and dual decomposition techniques to impose logic/budget/structured constraints in structured problems. Possible tasks could involve generating diverse output, forbidding certain configurations, etc.
Data:
- "weasel words": detecting hedges/uncertainty in writing in order to improve clarity
- Coreference in quizbowl
Evaluation: task-dependent
References:

Problem: Performing neural attention over very long sequences (e.g. for document-level classification, translation, ...)
Method: sparse hierarchical attention with product of sparsemaxes.
Data: text classification datasets
Evaluation: Accuracy; empirical analysis of where the models attend to.
Notes: If the top-level sparsemax gives zero probability to some paragraphs, those can be pruned from the computation graph. Can this lead to speedups?
References:
1. Yang, Yang, Dyer, He, Smola, Hovy. Hierarchical Attention Networks for Document Classification. NAACL 2016.
2. Martins and Astudillo. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. ICML 2016.

Problem: Energy networks can be used for density estimation (estimating p(x)) and structured prediction (estimating p(y|x)) when y is structured. Both cases pose challenges due to intractability of computing the partition function and sampling. In structured output prediction with energy networks, the idea is to replace discrete structured inference with continuous optimization in a neural net. This project can be either a survey about recent work in this area or it can explore some practical applications. Applications: multi-label classification and sequence tagging.
Method: Learn a neural network E(x; w) to model the energy of x or E(x, y; w) to model the energy of an output configuration y (relaxed to be a continuous variable). Inference becomes min_y E(x, y; w). How far can this relaxation take us? Can it be better/faster than global combinatorial optimization approaches?
Data: MNIST, multi-label classification, sequence tagging.
References:

Problem: Improve the generalization of neural nets by searching similar examples in the training set.
Method: kNN + NN, fast search + NN, prototype attention (efficient attention over the dataset)
Data: See in references below.
References:

Problem: Estimate the quality of a translation hypothesis without access to reference translations.
Method: See OpenKiwi: Neural Nets, Transfer Learning, BERT, XLM, etc.
Data: in WMT2020 page
References:

Problem: Causal inference and discovery is a area of growing interest in machine learning and statistics, with numerous applications and connections to confounding removal, reinforcement learning, and disentanglement of factors of variation. This project can be either a survey about the area or it can explore some practical applications.
Method: Plenty to choose from!
Data: See the references below.
References:

Problem: Agents need to communicate to solve problems that require collaboration. The goal of this project is to apply techniques (for example using sparsemax or reinforcement learning) to induce communication among agents.
Data: See references below.
Evaluation: See references below.
References: