Summer School
Lab session files
M2 Research
2017-2018
- Introduction to Map-Reduce
- In-memory processing with Spark
- Stream processing
- Spark lab session (Tweet analysis) and code/data files
- Spark assignment (Flickr analysis) deadline 15th of December
- NoSQL databases
- Neo4j lab session with additional dataset
- Recommender systems
- Pattern mining
- Recommender systems lab session, code & data
Previous years
- Data management in large scale distributed systems: Introduction
- Distributed DBMS Architecture
- Distributed Database design: fragmentation & allocation
- Distributed Query Evaluation
- Background Transactions
- Distributed Transactions
- Replication
- Introduction to MapReduce
- In-Memory Processing with Spark
- Practical work MapReduce: subject and eclipse project (rename to zip)
- NoSQL databases
- Recommender systems 1 2 3 4
- Clustering 1 2 3
- Frequent Itemset Mining
- Spark practical work and skeleton of Spark project (rename it in .zip to decompress)
Mastère Spécialisé Big Data
- Introduction à Map-Reduce
- Sujet TP Hadoop Map Reduce et squelette de projet
- Introduction à Spark
- Streaming
- TP Spark – streaming et squelette de projet
- Recommender systems: sujet, code et données
./fileProducer.sh ../1Mtweets_en.txt .1 | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic batch_tweets
M2 MIAGE / M2 PGI
fileproducer.sh:
#!/bin/sh
while true
do
while read p; do
echo $p
sleep $2
done <$1
done
- Introduction à Map-Reduce
- Sujet TP Hadoop Map Reduce et squelette de projet
- In-memory processing with Spark
- Stream processing
- Spark lab session (Tweet analysis) and code/data files
- debrief Hadoop
Ensimag ISI
- Introduction à Map-Reduce
- Sujet TP Hadoop Map Reduce et squelette de projet et échantillon 1M
- In-memory processing with Spark
- Stream processing
- TP Spark (Tweet analysis) et code/data. Si possible, faites la partie “getting started” avant la séance de TP pour gagner du temps.