Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
distributed_ml [2018/03/14 22:18]
patra
distributed_ml [2018/04/10 16:05]
damaskin
Line 4: Line 4:
  
 === Asynchronous ML on android devices=== === Asynchronous ML on android devices===
-This project is related to training ML algorithms asynchronously on Android devices. The challenges here are primarily: mobile churn, latency, memory, bandwidth and accuracy. ​The main goal is building a framework to address ​these challenges.+This project is related to training ML algorithms asynchronously on Android devices. The challenges here are primarily: mobile churn, latency, energy consumption, memory, bandwidth and accuracy. ​ 
 +This project involves multiple semester projects that tackle subsets of these challenges ​from the algorithmic (SGD variants) and the system (framework for android) perspective.
  
 Related papers:\\ Related papers:\\
-[1] __[[http://​ttic.uchicago.edu/​~kgimpel/​papers/​gimpel+das+smith.conll10.pdf|Distributed Asynchronous Online Learning for Natural Language Processing]]__ \\ +[1] __[[http://​net.pku.edu.cn/​~cuibin/​Papers/​2017%20sigmod.pdf|Heterogeneity-aware Distributed Parameter Servers]]__ \\ 
-[2] __[[http://​net.pku.edu.cn/​~cuibin/​Papers/​2017%20sigmod.pdf|Heterogeneity-aware Distributed Parameter Servers]]__ +[2] __[[http://proceedings.mlr.press/v70/zhang17e.html|ZipMLTraining Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning]]__
- +
-=== Multi-output multi-class classification === +
-The goal of this project is to design a distributed ML algorithm suitable for multi-output classification (e.g. music tag prediction on mobile devices). Deep learning-based approaches seem promising for this task. Nevertheless,​ current methods target only single-output classification. +
- +
-Related papers:\\ +
-[1] __[[https://static.googleusercontent.com/media/research.google.com/​en//​pubs/​archive/​45530.pdf|Deep Neural Networks for YouTube Recommendations]]__ \\ +
-[2] __[[http://​papers.nips.cc/​paper/​5004-deep-content-based-music-recommendation.pdf|Deep content-based music recommendation]]__ \\ +
-[3] __[[http://​www.columbia.edu/​~jwp2128/​Papers/​LiangPaisleyEllis2014.pdf|Codebook-based scalable music tagging with poisson matrix factorization]]__ +
  
 ===Personalized/​Private ML in P2P network=== ===Personalized/​Private ML in P2P network===
Line 27: Line 19:
 [2] __[[https://​www.cs.cornell.edu/​~shmat/​shmat_ccs15.pdf|Privacy-Preserving Deep Learning]]__\\ [2] __[[https://​www.cs.cornell.edu/​~shmat/​shmat_ccs15.pdf|Privacy-Preserving Deep Learning]]__\\
 [3] __[[http://​static.googleusercontent.com/​media/​research.google.com/​en//​pubs/​archive/​45428.pdf|Deep Learning with Differential Privacy]]__ [3] __[[http://​static.googleusercontent.com/​media/​research.google.com/​en//​pubs/​archive/​45428.pdf|Deep Learning with Differential Privacy]]__
 +
 +===P2P data market===
 +The goal is the design of a P2P infrastructure that enables service providers (peers) to buy and sell data.
 +The main challenge for a candidate scheme is the definition and measurement of the data utility from the perspective of each peer.
 +The revenue model and privacy guarantees are also two important challenges for this setting.
 +
 +Related papers:\\
 +[1] __[[http://​www.cs.utexas.edu/​users/​shmat/​shmat_kdd08.pdf|The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing]]__\\
 +[2] __[[http://​www.vldb.org/​pvldb/​vol9/​p1695-upadhyaya.pdf|Price-Optimal Querying with Data APIs]]__\\
 +[3] __[[http://​pages.cs.wisc.edu/​~paris/​papers/​data_pricing.pdf|Query-Based Data Pricing]]__\\
  
 ===Federated optimization:​ distributed SGD with fault tolerance=== ===Federated optimization:​ distributed SGD with fault tolerance===
Line 35: Line 37:
 [2] __[[https://​arxiv.org/​pdf/​1401.2753.pdf|Stochastic Optimization with Importance Sampling]]__ ​ [2] __[[https://​arxiv.org/​pdf/​1401.2753.pdf|Stochastic Optimization with Importance Sampling]]__ ​
  
 +===Byzantine-tolerant machine learning===
 +Each node in the distributed setting can exhibit arbitrary (byzantine) behaviour during the learning procedure.
 +This project explores algorithms (SGD variants) both in the synchronous and asynchronous setup.
 +The student will work on our code base on top of tensorflow for the implementation of these algorithms.
  
-===P2P data market=== +Related papers:\\ 
-The goal is the design of P2P infrastructure ​that enables service providers ​(peersto buy and sell data+[1] __[[https://​arxiv.org/​pdf/​1802.07928.pdf|Asynchronous Byzantine Machine Learning]]__\\ 
-The main challenge for a candidate scheme ​is the definition and measurement ​of the data utility from the perspective of each peer. +[2] __[[http://​papers.nips.cc/​paper/​6617-machine-learning-with-adversaries-byzantine-tolerant-gradient-descent|Machine Learning with Adversaries:​ Byzantine Tolerant Gradient Descent]]__ \\ 
-The revenue model and privacy guarantees are also two important challenges for this setting.+ 
 +===Black-Box attacks against recommender systems=== 
 +A recommender system can be viewed as black-box ​that users query with feedback ​(e.g., ratings, clicksbefore getting the output list of recommendations
 +The goal is to infer properties ​of the recommendation algorithm by observing ​the output from different queries.
  
 Related papers:\\ Related papers:\\
-[1] __[[http://www.cs.utexas.edu/users/shmat/shmat_kdd08.pdf|The ​Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing]]__\\ +[1] __[[https://www.usenix.org/​system/​files/​conference/​usenixsecurity16/​sec16_paper_tramer.pdf|Stealing Machine Learning Models via Prediction APIs]]__\\ 
-[2] __[[http://www.vldb.org/pvldb/vol9/p1695-upadhyaya.pdf|Price-Optimal Querying with Data APIs]]__\\ +[2] __[[https://arxiv.org/​pdf/1602.02697v3.pdf|Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples]]__\\ 
-[3] __[[http://pages.cs.wisc.edu/~paris/papers/data_pricing.pdf|Query-Based Data Pricing]]__\\+ 
 + 
 +=== Multi-output multi-class classification === 
 +The goal of this project is to design a distributed ML algorithm suitable for multi-output classification (e.g. music tag prediction on mobile devices). Deep learning-based approaches seem promising for this task. Nevertheless,​ current methods target only single-output classification. 
 + 
 +Related papers:\\ 
 +[1] __[[https://​static.googleusercontent.com/​media/​research.google.com/​en//​pubs/​archive/​45530.pdf|Deep Neural Networks for YouTube Recommendations]]__ \\ 
 +[2] __[[http://papers.nips.cc/paper/5004-deep-content-based-music-recommendation.pdf|Deep content-based music recommendation]]__ \\ 
 +[3] __[[http://www.columbia.edu/~jwp2128/Papers/LiangPaisleyEllis2014.pdf|Codebook-based scalable music tagging with poisson matrix factorization]]__
  
 **Contact:​** __[[http://​people.epfl.ch/​georgios.damaskinos|Georgios Damaskinos]]__ **Contact:​** __[[http://​people.epfl.ch/​georgios.damaskinos|Georgios Damaskinos]]__