Analyzing Large Data Sets from XGC1 Magnetic Fusion
        Simulations Using Apache Spark
      
Authors: R. Michael Churchill
    
Abstract: Apache Spark is explored as a tool for
      analyzing large data sets from the magnetic fusion simulation code
      XGC1. Implementation details of Apache Spark on the NERSC Edison
      supercomputer are discussed, including binary file reading, and
      parameter setup. An unsupervised machine learning algorithm,
      k-means clustering, is applied to XGC1 particle distribution
      function data, showing that highly turbulent spatial regions do
      not have common coherent structures but rather broad, ring-like
      structures in velocity space. 
      
    
Submitted to: IEEE NYSDS Conference Proceedings
 Download PPPL-5311 (pdf
      1.4 MB 6 pp) 
      _________________________________________________________________________________________________