High-performance Algorithms for Data Analysis in Computational CosmologyPublic Deposited
Modern cosmological simulations are some of the world's largest and most demanding numerical computations that are run on state-of-the-art supercomputers. The codes use N-body and mesh-based methods in gravity-only solvers, and both Lagrangian and Eulerian schemes to model gas dynamics. Even complex astrophysical effects such as of star formation and the feedback from supernova and active galactic nuclei can be modeled, using sub-grid methods to overcome resolution limitations. Astonishingly, the raw data outputs from system-scale runs on supercomputers can easily reach several petabytes for a single simulation snapshot. The richness and realism attained in a modern cosmological simulation provides a wealth of information that requires sophisticated analysis techniques to extract scientific knowledge about the universe being represented. For example, highly resolved structures in a large simulation volume can provide a means to study the theoretical predictions of gravitational lensing and to investigate the astrophysical and observational systematics that could falsely mimic true scientific signals. Indeed, the modern era of precision cosmology is driven by the careful calibration of errors (via providing covariance estimates), and by testing, optimizing, and validating observational strategies. While modern simulation codes have achieved unprecedented performance and scalability, the aforesaid is often untrue for many important analysis tools -- which often become the bottleneck in complex data processing pipelines. This dissertation addresses the scalability problems data processing pipelines encounter when utilizing large-volume, high-resolution N-body simulations by the development of new parallel algorithms for fundamental analyses performed on the particles, where the sheer amount of data becomes intrinsically problematic. First, we introduce a high quality particle-based method for tracking the evolution of the Friends-of-Friends halos in simulations to build merger trees. Second, we leverage merger tree information and use particle groups near halos centers, called ``cores'', to track the evolution of halo substructure. Third, we improve the interpolation accuracy and efficiency of the Delaunay tessellation field estimator for surface density field reconstruction by developing an approach that takes full advantage of the adaptive triangular mesh for line-of-sight integration. Lastly, we show how these methods are applied to generate a synthetic sky catalog for a mock astronomical survey.