Analyzing Large Data Sets from XGC1 Magnetic Fusion
Simulations Using Apache Spark
Authors: R. Michael Churchill
Abstract: Apache Spark is explored as a tool for
analyzing large data sets from the magnetic fusion simulation code
XGC1. Implementation details of Apache Spark on the NERSC Edison
supercomputer are discussed, including binary file reading, and
parameter setup. An unsupervised machine learning algorithm,
k-means clustering, is applied to XGC1 particle distribution
function data, showing that highly turbulent spatial regions do
not have common coherent structures but rather broad, ring-like
structures in velocity space.
Submitted to: IEEE NYSDS Conference Proceedings
Download PPPL-5311 (pdf
1.4 MB 6 pp)