Project Examples for Deep Structured Learning (Fall 2020)
We suggest below some project ideas. Feel free to use this as inspiration for your project. Talk to us for more details.
Emergent Communication
- Problem: Agents need to communicate to solve problems that require collaboration. The goal of this project is to apply techniques (for example using sparsemax or reinforcement learning) to induce communication among agents.
- Data: See references below.
- Evaluation: See references below.
-
References:
- Serhii Havrylov and Ivan Titov. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. NeurIPS 2017.
- Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark. Emergent Communication through Negotiation. In ICLR 2018.
- Satwik Kottur, José Moura, Stefan Lee, Dhruv Batra. Natural Language Does Not Emerge Naturally in Multi-Agent Dialog. In EMNLP 2017.
Explainability of Neural Networks
- Problem: Neural networks are black boxes and not amenable to interpretation. The goal of this project is to develop and study methods that lead to explainability of neural network model's predictons (for example using sparsemax attention). This project can be either a survey about recent work in this area or it can explore some practical applications.
- Method: For example, sparse attention, rationalizers, gradient-based measures of feature importance, LIME, influence functions, etc.
- Data: BEER dataset, Stanford Sentiment Treebank, IMDB Large Movie Reviews Corpus, etc. See references below.
-
References:
- Marco T Ribeiro, Sammer Singh, and Carlos Guestrin. Why Should I Trust You? Explaining the Predictions of Any Classifier. KDD 2016..
- Tao Lei, Regina Barzilay and Tommi Jaakkola. Rationalizing Neural Predictions. EMNLP 2016..
- Marcos V. Treviso, André F. T. Martins. Towards Prediction Explainability through Sparse Communication. Blackbox Workshop 2020..
- Zachary C Lipton. The mythos of model interpretability. ICML 2016 Workshop on Human Interpretability in Machine Learning.
- Sarah Wiegreffe and Yuval Pinter. Attention is not not explanation. EMNLP 2019..
Generative Adversarial Networks for Discrete Data
- Problem: Compare different deep generative models' ability to generate discrete data (such as text).
- Methods: Generative Adversarial Networks.
- Data: SNLI (just the text), Yelp/Yahoo datasets for unaligned sentiment/topic transfer, other text data.
- Evaluation: Some of the metrics in [3].
-
References:
Sub-quadratic Transformers
- Problem: Transformers and BERT models are extremely large and expensive to train and keep in memory. The goal of this project is to make Transformers more efficient in terms of time and memory complexity by reducing the quadratic cost of self-attention or by inducing a sparser and smaller model.
- Method: See references below.
- Data: WMT datasets, WikiText, etc. See in the original Transformer paper and references below.
- References:
- Tay, Yi, Mostafa Dehghani, Dara Bahri, and Donald Metzler. "Efficient Transformers: A Survey." ArXiv 2020 (e.g. Transformer-XL, Reformer, Linformer, Linear transformer, Compressive Transformer, etc.)
- Gonçalo M. Correia, Vlad Niculae, André F.T. Martins. Adaptively Sparse Transformers. EMNLP 2019.
- Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Armand Joulin. Adaptive Attention Span in Transformers. ACL 2019.
- Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. Generating Long Sequences with Sparse Transformers. Arxiv 2019.
Contextual Probabilistic Embeddings / Language Modeling
- Problem: Embedding words as vectors (aka point masses) cannot distinguish between more vague or more specific concepts. One solution is to embed words as a mean vector μ and a covariance Σ. Muzellec & Cuturi have a nice framework for this, tested for learning non-contextualized embeddings. Can we extend it to contextualized embeddings via language modelling? E.g. a model that reads an entire sentence and predicts a context-dependent pair (μ, Σ) for each word (perhaps left-to-right or masked). What likelihood to use? How can we evaluate the learned embeddings downstream?
- Method: See reference below.
-
References:
Constrained Structured Classification with AD3
- Problem: Use AD3 and dual decomposition techniques to impose logic/budget/structured constraints in structured problems. Possible tasks could involve generating diverse output, forbidding certain configurations, etc.
- Data:
- Evaluation: task-dependent
-
References:
Structured multi-label classification
- Problem: Multi-label classification is a learning setting where every sample can be assigned zero, one or more labels.
- Method: Correlations between labels can be exploited by learning an affinity matrix of label correlation. Inference in a fully-connected correlation graph is hard; approximating the graph by a tree makes inference fast (Viterbi can be used.)
- Data: Multi-label datasets
- Evaluation: see here
-
References:
- Sorower. A Literature Survey on Algorithms for Multi-label Learning.
- Thomas Finley and Thorsten Joachims. 2008. Training structural SVMs when exact inference is intractable.
- Pystruct
- Scikit.ml (very strong methods not based on structured prediction)
Hierarchical sparsemax attention
- Problem: Performing neural attention over very long sequences (e.g. for document-level classification, translation, ...)
- Method: sparse hierarchical attention with product of sparsemaxes.
- Data: text classification datasets
- Evaluation: Accuracy; empirical analysis of where the models attend to.
- Notes: If the top-level sparsemax gives zero probability to some paragraphs, those can be pruned from the computation graph. Can this lead to speedups?
-
References:
Sparse link prediction
- Problem: Predicting links in a large structured graph. For instance: predict co-authorship, movie recommendation, coreference resolution, discourse relations between sentences in a document.
- Method: The simplest approach is independent binary classification: for every node pair (i, j), predict whether there is a link or not. Issues: Very high imbalance: most nodes are not linked. Structure and higher-order correlations are ignored in independent approach. Develop a method that can address the issues: incorporate structural correlations (e.g. with combinatorial inference, constraints, latent variables) and account for imbalance (ideally via pairwise ranking losses: learn a scorer such that S(i, j) > S(k, l) if there is an edge (i, j) but no edge (k, l).
- Data: arXiv macro usage, Coreference in quizbowl
- Notes: Can graph-CNNs (previous idea) be useful here?
-
References:
Energy Networks
- Problem: Energy networks can be used for density estimation (estimating p(x)) and structured prediction (estimating p(y|x)) when y is structured. Both cases pose challenges due to intractability of computing the partition function and sampling. In structured output prediction with energy networks, the idea is to replace discrete structured inference with continuous optimization in a neural net. This project can be either a survey about recent work in this area or it can explore some practical applications. Applications: multi-label classification and sequence tagging.
- Method: Learn a neural network E(x; w) to model the energy of x or E(x, y; w) to model the energy of an output configuration y (relaxed to be a continuous variable). Inference becomes min_y E(x, y; w). How far can this relaxation take us? Can it be better/faster than global combinatorial optimization approaches?
- Data: MNIST, multi-label classification, sequence tagging.
-
References:
- LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., & Huang, F. (2006). A tutorial on energy-based learning. Predicting structured data, 1(0)..
- Will Grathwohl, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky (2020). Your classifier is secretly an energy based model and you should treat it like one. ICLR 2020..
- Belanger and McCallum. Structured Prediction Energy Networks. ICML 2016.
- Belager, Yang, McCallum. End-to-End Learning for Structured Prediction Energy Networks. ICML 2017.
Memory-augmented Neural Networks
- Problem: Improve the generalization of neural nets by searching similar examples in the training set.
- Method: kNN + NN, fast search + NN, prototype attention (efficient attention over the dataset)
- Data: See in references below.
- References:
- Khandelwal, Urvashi, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Generalization through Memorization: Nearest Neighbor Language Models. ICLR 2020
- Wiseman, Sam, and Karl Stratos. Label-Agnostic Sequence Labeling by Copying Nearest Neighbors. ACL 2019
- Lample, Guillaume, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. Large Memory Layers with Product Keys. NeurIPS 2019
- Hashimoto, Tatsunori B., Kelvin Guu, Yonatan Oren, and Percy S. Liang. A retrieve-and-edit framework for predicting structured outputs. NeurIPS 2018
- Guu, Kelvin, Tatsunori B. Hashimoto, Yonatan Oren, and Percy Liang. Generating Sentences by Editing Prototypes. TACL 2018
- Tu, Zhaopeng, Yang Liu, Shuming Shi, and Tong Zhang. Learning to Remember Translation History with a Continuous Cache. TACL 2018
- Weston, Jason, Emily Dinan, and Alexander H. Miller. Retrieve and Refine- Improved Sequence Generation Models For Dialogue. EMNLP - WSCAI 2018
- Gu, Jiatao, Yong Wang, Kyunghyun Cho, and Victor OK Li. Search Engine Guided Neural Machine Translation. AAAI 2018
Quality Estimation and Uncertainty Estimation
- Problem: Estimate the quality of a translation hypothesis without access to reference translations.
- Method: See OpenKiwi: Neural Nets, Transfer Learning, BERT, XLM, etc.
- Data: in WMT2020 page
- References:
- Kreutzer, Julia, Shigehiko Schamoni, and Stefan Riezler. QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation. WMT 2015
- Kim, Hyun, Jong-Hyeok Lee, and Seung-Hoon Na. Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation. WMT 2017
- Wang, Jiayi, Kai Fan, Bo Li, Fengming Zhou, Boxing Chen, Yangbin Shi, and Luo Si. Alibaba Submission for WMT18 Quality Estimation Task. WMT 2018.
- Fabio Kepler, Jonay Trénous, Marcos Treviso, Miguel Vera, André F. T. Martins. OpenKiwi: An Open Source Framework for Quality Estimation. ACL 2019..
- Fomicheva, Marina, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, and Lucia Specia. Unsupervised Quality Estimation for Neural Machine Translation. arXiv preprint 2020
Causality and Disentanglement
- Problem: Causal inference and discovery is a area of growing interest in machine learning and statistics, with numerous applications and connections to confounding removal, reinforcement learning, and disentanglement of factors of variation. This project can be either a survey about the area or it can explore some practical applications.
- Method: Plenty to choose from!
- Data: See the references below.
- References:
- Elias Bareimboim. Causal Reinforcement Learning. Tutorial in ICML 2020..
- Katherine A. Keith, David Jensen, and Brendan O'Connor. Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates..
- Bengio, Yoshua, Tristan Deleu, Nasim Rahaman, Rosemary Ke, Sébastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms. ICLR 2020
- Zhu, Shengyu, Ignavier Ng, and Zhitang Chen. Causal Discovery with Reinforcement Learning. ICLR 2020
- Schölkopf, Bernhard. Causality for Machine Learning
- Alvarez-Melis, David, and Tommi S. Jaakkola. A causal framework for explaining the predictions of black-box sequence-to-sequence models. EMNLP 2017