Project Examples for Deep Structured Learning (Fall 2018)
We suggest below some project ideas. Feel free to use this as inspiration for your project. Talk to us for more details.
Sparse Classification with Sparsemax
- Problem: Apply sparsemax and/or sparsemax loss to a problem that requires outputting sparse label probabilities or sparse latent variables (attention).
- Data: Multi-label datasets, SNLI, WMT, any data containing many labels for which only a few are plausible for each example.
- Evaluation: F1, accuracy, inspection of where the model learns to attend to.
-
References:
Deep Generative Models for Discrete Data
- Problem: Compare different deep generative models' ability to generate discrete data (such as text).
- Methods: Generative Adversarial Networks, Variational Auto-Encoders.
- Data: SNLI (just the text), Yelp/Yahoo datasets for unaligned sentiment/topic transfer, other text data.
- Evaluation: Some of the metrics in [4].
-
References:
- Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio. Generative Adversarial Networks. NIPS 2014.
- Kingma and Wellington. Auto-Encoding Variational Bayes. NIPS 2013.
- Zhao, Kim, Zhang, Rush, LeCun. Adversarially Regularized Autoencoders. ICML 2018.
- Semeniuta, Severyn, Gelly. On Accurate Evaluation of GANs for Language Generation. 2018.
Constrained Structured Classification with AD3
- Problem: Use AD3 and dual decomposition techniques to impose logic/budget/structured constraints in structured problems. Possible tasks could involve generating diverse output, forbidding certain configurations, etc.
- Data:
- Evaluation: task-dependent
-
References:
Structured multi-label classification
- Problem: Multi-label classification is a learning setting where every sample can be assigned zero, one or more labels.
- Method: Correlations between labels can be exploited by learning an affinity matrix of label correlation. Inference in a fully-connected correlation graph is hard; approximating the graph by a tree makes inference fast (Viterbi can be used.)
- Data: Multi-label datasets
- Evaluation: see here
-
References:
- Sorower. A Literature Survey on Algorithms for Multi-label Learning.
- Thomas Finley and Thorsten Joachims. 2008. Training structural SVMs when exact inference is intractable.
- Pystruct
- Scikit.ml (very strong methods not based on structured prediction)
Hierarchical sparsemax attention
- Problem: Performing neural attention over very long sequences (e.g. for document-level classification, translation, ...)
- Method: sparse hierarchical attention with product of sparsemaxes.
- Data: text classification datasets
- Evaluation: Accuracy; empirical analysis of where the models attend to.
- Notes: If the top-level sparsemax gives zero probability to some paragraphs, those can be pruned from the computation graph. Can this lead to speedups?
-
References:
Sparse group lasso attention mechanism
- Problem: For structured data segmented into given "groups" (e.g. fields in a form, regions in an image, sentences in a paragraph), design a "group-sparse" attention mechanism that tends to give zero weight to entire groups when deemed not relevant enough.
- Method: a Sparse Group-Lasso penalty in a generalized structured attention framework [2]
- Notes: the L1 term is redundant when optimizing over the simplex; regular group lasso will be sparse!
-
References:
Geometrical structure: embedding ellipses instead of points
- Problem: Go beyond vector (point) embeddings: embed objects as ellipses instead of points; capture notions of inclusion/overlap.
-
References:
Embed structured input data with graph-CNNs
- Problem: learn good fixed-size hidden representations for data that comes in graph format with different shapes and sizes.
- Method: Graph convolutional networks
- Data: arXiv macro usage, annotated semantic relationships datasets, paralex
-
References:
Sparse link prediction
- Problem: Predicting links in a large structured graph. For instance: predict co-authorship, movie recommendation, coreference resolution, discourse relations between sentences in a document.
- Method: The simplest approach is independent binary classification: for every node pair (i, j), predict whether there is a link or not. Issues: Very high imbalance: most nodes are not linked. Structure and higher-order correlations are ignored in independent approach. Develop a method that can address the issues: incorporate structural correlations (e.g. with combinatorial inference, constraints, latent variables) and account for imbalance (ideally via pairwise ranking losses: learn a scorer such that S(i, j) > S(k, l) if there is an edge (i, j) but no edge (k, l).
- Data: arXiv macro usage, Coreference in quizbowl
- Notes: Can graph-CNNs (previous idea) be useful here?
-
References:
Structured Prediction Energy Networks
- Problem: Structured output prediction with energy networks: replace discrete structured inference with continuous optimization in a neural net. Applications: multi-label classification; simple structured problems: sequence tagging, arc-factored parsing?
- Method: Learn a neural network E(x, y; w) to model the energy of an output configuration y (relaxed to be a continuous variable). Inference becomes min_y E(x, y; w). How far can this relaxation take us? Can it be better/faster than global combinatorial optimization approaches?
- Data: Sequence tagging, parsing, optimal matching?
- Notes: When E is a neural network, min_y E(x, y; w) is a non-convex optimization problem (possibly with mild constraints such as y in [0, 1]. Amos et al. have an approach that allows E to be a complicated neural net but remain convex in y. Is this beneficial? Are some kinds of structured data better suited for SPENs than others? E.g. sequence labelling seems "less structured" than dependency parsing.
-
References: