PPPL-5311

Analyzing Large Data Sets from XGC1 Magnetic Fusion Simulations Using Apache Spark

Authors: R. Michael Churchill

Abstract: Apache Spark is explored as a tool for analyzing large data sets from the magnetic fusion simulation code XGC1. Implementation details of Apache Spark on the NERSC Edison supercomputer are discussed, including binary file reading, and parameter setup. An unsupervised machine learning algorithm, k-means clustering, is applied to XGC1 particle distribution function data, showing that highly turbulent spatial regions do not have common coherent structures but rather broad, ring-like structures in velocity space.

Submitted to: IEEE NYSDS Conference Proceedings
_________________________________________________________________________________________________

Download PPPL-5311 (pdf 1.4 MB 6 pp)
_________________________________________________________________________________________________