Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
distributed_ml [2018/03/14 22:18] patra |
distributed_ml [2018/04/10 16:05] (current) damaskin |
||
---|---|---|---|
Line 4: | Line 4: | ||
=== Asynchronous ML on android devices=== | === Asynchronous ML on android devices=== | ||
- | This project is related to training ML algorithms asynchronously on Android devices. The challenges here are primarily: mobile churn, latency, memory, bandwidth and accuracy. The main goal is building a framework to address these challenges. | + | This project is related to training ML algorithms asynchronously on Android devices. The challenges here are primarily: mobile churn, latency, energy consumption, memory, bandwidth and accuracy. |
+ | This project involves multiple semester projects that tackle subsets of these challenges from the algorithmic (SGD variants) and the system (framework for android) perspective. | ||
Related papers:\\ | Related papers:\\ | ||
- | [1] __[[http://ttic.uchicago.edu/~kgimpel/papers/gimpel+das+smith.conll10.pdf|Distributed Asynchronous Online Learning for Natural Language Processing]]__ \\ | + | [1] __[[http://net.pku.edu.cn/~cuibin/Papers/2017%20sigmod.pdf|Heterogeneity-aware Distributed Parameter Servers]]__ \\ |
- | [2] __[[http://net.pku.edu.cn/~cuibin/Papers/2017%20sigmod.pdf|Heterogeneity-aware Distributed Parameter Servers]]__ | + | [2] __[[http://proceedings.mlr.press/v70/zhang17e.html|ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning]]__ |
- | + | ||
- | === Multi-output multi-class classification === | + | |
- | The goal of this project is to design a distributed ML algorithm suitable for multi-output classification (e.g. music tag prediction on mobile devices). Deep learning-based approaches seem promising for this task. Nevertheless, current methods target only single-output classification. | + | |
- | + | ||
- | Related papers:\\ | + | |
- | [1] __[[https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf|Deep Neural Networks for YouTube Recommendations]]__ \\ | + | |
- | [2] __[[http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf|Deep content-based music recommendation]]__ \\ | + | |
- | [3] __[[http://www.columbia.edu/~jwp2128/Papers/LiangPaisleyEllis2014.pdf|Codebook-based scalable music tagging with poisson matrix factorization]]__ | + | |
===Personalized/Private ML in P2P network=== | ===Personalized/Private ML in P2P network=== | ||
Line 27: | Line 19: | ||
[2] __[[https://www.cs.cornell.edu/~shmat/shmat_ccs15.pdf|Privacy-Preserving Deep Learning]]__\\ | [2] __[[https://www.cs.cornell.edu/~shmat/shmat_ccs15.pdf|Privacy-Preserving Deep Learning]]__\\ | ||
[3] __[[http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45428.pdf|Deep Learning with Differential Privacy]]__ | [3] __[[http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45428.pdf|Deep Learning with Differential Privacy]]__ | ||
+ | |||
+ | ===P2P data market=== | ||
+ | The goal is the design of a P2P infrastructure that enables service providers (peers) to buy and sell data. | ||
+ | The main challenge for a candidate scheme is the definition and measurement of the data utility from the perspective of each peer. | ||
+ | The revenue model and privacy guarantees are also two important challenges for this setting. | ||
+ | |||
+ | Related papers:\\ | ||
+ | [1] __[[http://www.cs.utexas.edu/users/shmat/shmat_kdd08.pdf|The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing]]__\\ | ||
+ | [2] __[[http://www.vldb.org/pvldb/vol9/p1695-upadhyaya.pdf|Price-Optimal Querying with Data APIs]]__\\ | ||
+ | [3] __[[http://pages.cs.wisc.edu/~paris/papers/data_pricing.pdf|Query-Based Data Pricing]]__\\ | ||
===Federated optimization: distributed SGD with fault tolerance=== | ===Federated optimization: distributed SGD with fault tolerance=== | ||
Line 35: | Line 37: | ||
[2] __[[https://arxiv.org/pdf/1401.2753.pdf|Stochastic Optimization with Importance Sampling]]__ | [2] __[[https://arxiv.org/pdf/1401.2753.pdf|Stochastic Optimization with Importance Sampling]]__ | ||
+ | ===Byzantine-tolerant machine learning=== | ||
+ | Each node in the distributed setting can exhibit arbitrary (byzantine) behaviour during the learning procedure. | ||
+ | This project explores algorithms (SGD variants) both in the synchronous and asynchronous setup. | ||
+ | The student will work on our code base on top of tensorflow for the implementation of these algorithms. | ||
- | ===P2P data market=== | + | Related papers:\\ |
- | The goal is the design of a P2P infrastructure that enables service providers (peers) to buy and sell data. | + | [1] __[[https://arxiv.org/pdf/1802.07928.pdf|Asynchronous Byzantine Machine Learning]]__\\ |
- | The main challenge for a candidate scheme is the definition and measurement of the data utility from the perspective of each peer. | + | [2] __[[http://papers.nips.cc/paper/6617-machine-learning-with-adversaries-byzantine-tolerant-gradient-descent|Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent]]__ \\ |
- | The revenue model and privacy guarantees are also two important challenges for this setting. | + | |
+ | ===Black-Box attacks against recommender systems=== | ||
+ | A recommender system can be viewed as a black-box that users query with feedback (e.g., ratings, clicks) before getting the output list of recommendations. | ||
+ | The goal is to infer properties of the recommendation algorithm by observing the output from different queries. | ||
Related papers:\\ | Related papers:\\ | ||
- | [1] __[[http://www.cs.utexas.edu/users/shmat/shmat_kdd08.pdf|The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing]]__\\ | + | [1] __[[https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf|Stealing Machine Learning Models via Prediction APIs]]__\\ |
- | [2] __[[http://www.vldb.org/pvldb/vol9/p1695-upadhyaya.pdf|Price-Optimal Querying with Data APIs]]__\\ | + | [2] __[[https://arxiv.org/pdf/1602.02697v3.pdf|Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples]]__\\ |
- | [3] __[[http://pages.cs.wisc.edu/~paris/papers/data_pricing.pdf|Query-Based Data Pricing]]__\\ | + | |
+ | |||
+ | === Multi-output multi-class classification === | ||
+ | The goal of this project is to design a distributed ML algorithm suitable for multi-output classification (e.g. music tag prediction on mobile devices). Deep learning-based approaches seem promising for this task. Nevertheless, current methods target only single-output classification. | ||
+ | |||
+ | Related papers:\\ | ||
+ | [1] __[[https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf|Deep Neural Networks for YouTube Recommendations]]__ \\ | ||
+ | [2] __[[http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf|Deep content-based music recommendation]]__ \\ | ||
+ | [3] __[[http://www.columbia.edu/~jwp2128/Papers/LiangPaisleyEllis2014.pdf|Codebook-based scalable music tagging with poisson matrix factorization]]__ | ||
**Contact:** __[[http://people.epfl.ch/georgios.damaskinos|Georgios Damaskinos]]__ | **Contact:** __[[http://people.epfl.ch/georgios.damaskinos|Georgios Damaskinos]]__ | ||