Workshop de Big Data con Apache Spark [🇪🇸]

Material del Workshopde Big Data

Contenidos

Infrastructura

El workshop simula una instalación de producción utilizando container de Docker. docker-compose.yml contiene las definiciones y configuraciones para esos servicios y sus respectivas UIs:

Apache Spark: Spark Master UI | Job Progress
Apache Kafka:
Postgres:
Superset: Nuestro Dashboard

Los puertos de acceso a cada servicio quedaron los defaults. Ej: spark master:7077, postgres: 5432

Levantar ambiente

Instalar el ambiente siguiendo las instrucciones acá.

Correr el script que levanta el ambiente Usage: control-env.sh (start|stop|cleanup):

./control-env.sh start

**IMPORTANTE** el script `control-env.sh cleanup` borra cualquier dado que haya sido procesado anteriormente.


# Access Spark-Master and run spark-shell
docker exec -it master bash
root@588acf96a879:/app# spark-shell

Probar:

val file = sc.textFile("/dataset/yahoo-symbols-201709.csv")
file.count
file.take(10).foreach(println)

Acceder al Spark Master: http://localhost:8080 y SPARK-UI: http://localhost:4040.

Troubleshooting

Si los jobs mueren (KILLED) y no se completan puede ser debido a la memória disponible para Docker, aumente la memoria > 8Gb al proceso de Docker:

Siga leyendo

Agradecimientos

Juan Pampliega (MuttData): expandir y actualizar el ejemplo de Spark Streaming
Pedro Ferrari (MuttData): crear el notebook de pySpark con Machine Learning

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
airflow		airflow
code		code
dataset		dataset
images		images
jupyter/notebook		jupyter/notebook
nginx/html		nginx/html
postgres/scripts		postgres/scripts
scala		scala
spark		spark
superset/conf		superset/conf
vm		vm
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README-batch.md		README-batch.md
README-ml.md		README-ml.md
README-pyspark.md		README-pyspark.md
README-streaming.md		README-streaming.md
README-superset.md		README-superset.md
README.md		README.md
control-env.sh		control-env.sh
docker-compose.yml		docker-compose.yml

License

wgonzalez25/bigdata-workshop-es

Folders and files

Latest commit

History

Repository files navigation

Workshop de Big Data con Apache Spark [🇪🇸]

Contenidos

Infrastructura

Levantar ambiente

Troubleshooting

Siga leyendo

Agradecimientos

Sobre

About

Resources

License

Stars

Watchers

Forks

Languages