% --------------- WordCount using Spark Shell % create a working directory cd ~ mkdir -p mySpark/WordCount cd mySpark/WordCount % copy sample input data cp /home/NDBI040/spark/movies.txt ~/mySpark/WordCount % open Spark Shell (Scala environment) spark-shell % read the input file usin Scala API and create RDD val data = sc.textFile("movies.txt") % execute WordCount transformation, i.e. split each line into words, map each word into (word, 1) pair, and reduce those keys val result = data.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_) % apply the action, i.e. store all the transformations result into a text file result.saveAsTextFile("output") % merge the result (if small enough) val merge_result = result.collect() % quit and check the result :quit cat output/part-00000 cat output/part-00001 rm -r output % --------------- WordCount using Java % copy /home/NDBI040/spark/WordCountJava % move to folder with pom.xml and build cd ~ mkdir -p mySpark cp -r /home/NDBI040/spark/WordCountJava ~/mySpark cd mySpark/WordCountJava mvn clean install % check the result in subfolder target % move there file /home/NDBI040/spark/movies.txt % run the task cp /home/NDBI040/spark/movies.txt ~/mySpark/WordCountJava/target cd target spark-submit --class WordCount --master local WordCountJava-1.0.jar movies.txt output2 % check output2 cat output2/part-00000 % ----------------- % copy /home/NDBI040/spark/SparkSQLJava % move to folder with pom.xml and build cd ~ cp -r /home/NDBI040/spark/SparkSQLJava ~/mySpark cd mySpark/SparkSQLJava mvn clean install % check the result in subfolder target % move there file /home/NDBI040/spark/actors.json % run the task cp /home/NDBI040/spark/actors.json ~/mySpark/SparkSQLJava/target cd target spark-submit --class ActorsSpark --master local SparkSQLJava-1.0.jar % ----------------- % copy /home/NDBI040/spark/PiEstimation % move to folder with pom.xml and build mvn clean install % check the result in subfolder target % run the task spark-submit --class PiEstimation --master local ndbi040-piEstimation-1.0.jar % check output*