Reliable Distributed Clustering with Redundant Data Assignment

Clustering is one of the most basic unsupervised learning tools developed to infer informative patterns in data. In this paper, Venkata Gandikota†, Arya Mazumdar†, and Ankit Singh‧‧‧

Clustering is one of the most basic unsupervised learning tools developed to infer informative patterns in data. In this paper, Venkata Gandikota†, Arya Mazumdar†, and Ankit Singh Rawat‡ present distributed generalized clustering algorithms that can handle large scale data across multiple machines in spite of straggling or unreliable machines. They propose a novel data assignment scheme that enables to obtain global information about the entire data even when some machines fail to respond with the results of the assigned local computations. The assignment scheme leads to distributed algorithms with good approximation guarantees for a variety of clustering and dimensionality reduction problems.

Posted in: Data Science & Technology →

  • Myself

    As an Investment Consultant and Specialist, Pompeo Pontone is a Professional Investor with 25 years’ experience in the fields of Investment Management, Quantitative Finance & Derivatives Trading and Data Science.

    Read the full bio

  • Social Links