Really Big Data Analytics on Graphs with Trillions of Edges
Envie de télécharger cette vidéo sur votre ordinateur ? Cliquez ici (poids : 0 Mo)
Résumé
Big graphs occur naturally in many applications, most obviously in social networks, but also in many other areas such as biology and forensics. Current approaches to processing large graphs use either supercomputers or very large clusters. In both cases the entire graph must reside in memory before it can be processed. We are pursuing an alternative approach, processing graphs from secondary storage. While this comes with a performance penalty, it makes analytics on very large graphs feasible on a small number of commodity machines. We have developed two systems, one for a single machine and one for a cluster of machines. X-Stream, the single machine solution, aims to make all secondary storage access sequential. It uses two techniques to achieve this goal, edge-centric processing and streaming partitions. Chaos, the cluster solution, starts from the observation that there is little benefit to locality when accessing data from secondary storage over a high-speed network. As a result, Chaos spreads graph data uniformly randomly over storage devices, and uses randomized access to achieve I/O balance. Chaos furthermore uses work stealing to achieve computational load balance. By using these techniques, it avoids the need for expensive partitioning during pre-processing, while still achieving good scaling behavior. With Chaos we have been able to process an 8-trillion-edge graph on 32 machines, a new milestone for graph size on a small cluster. I will describe both systems and their performance on a number of benchmarks and in comparison to state-of-the-art alternatives. This is joint work with Laurent Bindschaedler (EPFL), Jasmina Malicevic (EPFL) and Amitabha Roy (Intel Labs).
Note(s) Biographique(s)
Bibliographie
Orateur(s) : Willy Zwaenepoel
Public : Tous
Date : Tuesday March 22 2016
Lieu : Jussieu Amphi 25

- Le temps et les événements en informatique
- Le génie mathématique, du théorème des quatre couleurs à la classification des groupes
- Unifying logic and probability: A “New Dawn” for Artificial Intelligence?
- Taking Education Online: A Unique Opportunity for the New Millenium
- Lauréat prix Gilles Kahn 2012 : Camille Couprie
- Lauréat prix Gilles Kahn 2012 : Mathilde Noual
- Lauréat prix Gilles Kahn 2012 : Mathieu Feuillet
- Gilles Dowek: Are formal methods the future of air traffic control?
- Tony Hoare: Laws of concurrent system design
- On the Preservation of Digital Information
- L'information mentale
- Myths about MOOCs and Software Engineering Education
- Toward a Theory of Trust in Networks of Humans and Computers
- Computer Science: All Questions Answered
- Fluidization of discrete event models or a marriage between the discrete and the continuous
- Vers des bases de connaissances personnelles
- The new era of biology is computational
- Proofs, Secrets, and Computation
- Desperately seeking software perfection
- Les informaticiennes, de la dominance de classe aux discriminations de sexe
- Cybersecurity and network measurement : problematic in so many ways
- Really Big Data Analytics on Graphs with Trillions of Edges
- Robots that exceed human capabilities
- "Chirps" everywhere
- Abstract interpretation
- What Makes Digital Inclusion Good Or Bad? Liste détaillée.
Vidéos disponibles
- http://colloquium.lip6.fr
En savoir +
Flash Player
Si vous rencontrez des problèmes pour visualiser la vidéo, nous vous recommandons de mettre à jour Flash Player