This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.


List of projects.

List of Projects.

1 - Cylon

Cylon: High-Performance Data Engineering Framework

One of the most exciting aspects of the Big Data era for both the industry and research communities is the incredible progress being made in the domains of the machine and deep learning. Modern applications demand resources that are more than a single node can supply. The difficulties that the total data processing environment must address include a variety of data engineering for pre- and post-data processing, communication, and system integration. The ability of data analytics tools to quickly interface with existing frameworks in a variety of languages is a crucial requirement as it increases user productivity and efficiency.All of this calls for an effective and widely dispersed integrated approach to data processing, yet many of today’s well-liked data analytics solutions are unable to simultaneously meet all of these criteria.

In this project, we introduce Cylon, a high-performance distributed data processing toolkit that is open-source and easily integrated with current Big Data and AI/ML frameworks. It has a compact data structure as the foundation, a versatile C++ core, and language bindings for Python, Java, and C++ on top of it.

We develop Cylon’s design and demonstrate how it can be used as a standalone framework or imported as a library into already-existing applications. Early tests reveal that Cylon boosts well-known technologies like Apache Spark and Dask with significant performance gains for crucial operations and improved component linkages. The ultimate goal is to demonstrate how Cylon’s design supports cross-platform usage with the least amount of overhead, which includes well-known AI tools like PyTorch, Tensorflow, and Jupyter notebooks.

2 - Cloudmask


MLCommons Science Cloudmask benchmark

The scientific objective of CloudMask is to develop a segmentation model for classifying the pixels in satellite images. This classification allows to determine whether the given pixel belongs to a cloud or to a clear sky. The benchmark can be considered as both training and inference focused, where the science metric is same as the classification accuracy — number of pixels classified correctly. The performance metric, can be inference timing and scalability on the training across a number of GPUs.

Participants and Collaborators


  • Working code that can run benchmarks in parallel
  • Report
  • Submission of the benchmark to MLCommons



3 - Cosmoflow


High Performance Computing with the Cosmoflow Benchmark

CosmoFlow is a deep learning application that uses a convolutional neural network to predict the large-scale structure of the universe from cosmological simulations. Developed by researchers from Oak Ridge National Laboratory and NVIDIA, the project achieved a 4x speedup compared to traditional methods by leveraging GPUs and high-performance computing. CosmoFlow generates large datasets that researchers can analyze to gain insights into the evolution of the cosmos. The neural network is trained on 3D images generated by cosmological simulations, and the resulting datasets provide valuable information about the formation of galaxies, distribution of dark matter, and overall evolution of the universe. CosmoFlow enables faster and more efficient simulations, opening up new avenues of research in this area. The project has the potential to revolutionize cosmology research by enabling faster and more accurate simulations of the universe.



  1. MLcommons repository of cosmoflow,
  2. DSC respository of cosmoflow,

4 - Cylon on AWS

High Performance Data Engineering with Cylon on Amazon Web Services

High Performance Data Engineering with Cylon on Amazon Web Services

In recent years the data engineering discipline has been greatly impacted by Artificial Intelligence (AI) and Machine Learning (ML). The effect has ushered in research related to the speed, performance, and optimization of such processes [2]. To meet these ends, many frameworks have been proposed. One such framework is CylonData [1]. CylonData represents an architecture where performance critical operations are moved to a highly optimized library. Moreover, the architecture provides the capability to leverage the performance associated with in-memory data and distributed operations and data across processes, a key requirement related to processing large data engineering workloads at scale. Such benefits are realized, for example, in the conversion from tabular or table format to tensor format required for ML/DL or via the use of relational algebraic expressions such as joins, select, project, etc. More specifically Cylon is described as “a fast, scalable distributed memory data parallel library for processing structured data” [1]. While CylonData has focused on a MPI implementation using HPC ML resources, the research work here is to port this to a serverless compute infrastructure within AWS services such as AWS Lambda, ECS, EC2, Route 53, ALB, and EFS. Once this is completed we will be achieving two things. First, we will be showcasing this work will be available not only on HPC but also on AWS serverless and serverful compute resources. Second, we will be able to provide an extensive benchmark comparison between HPC and Serverless/Serverful Computing to showcase the strengths and weaknesses of both approaches.



  • 1, April 2023: Merge of UCC/UCX Bootstrapping to cylondata/man
  • 25, April 2023: Execution of Serverful Cylon Delivered via ECS Infrastructure using OpenMPI
  • 10, May 2023: Prototype of Servlerless delivery of cylon library via AWS Lambda Layers
  • 30, May 2023: Serverless Cache infrastructure fascade prototype and support via abstraction in the cylon source


  1. “Cylon.” cylondata/cylon, Accessed 9 September 2022.

  2. “Cylon Library for Fast & Scalable Data Engineering.” Cylon Blog, Accessed 9 September 2022.

5 - Data Compiler

Optimizing Large-Scale Deep Learning by Data Movement-Aware Compiler

Optimizing Large-Scale Deep Learning by Data Movement-Aware Compiler

Projet Members

Project Summary

This project aims to address the data movement, as a known major efficiency bottleneck of distributed training[1], by designing a tensor compiler[2] which can acquire and optimize the data movement graph and scheduling at the compilation time so that the execution becomes fully static for higher performance. Such Ahead-of-Time(AOT) optimization also enables opportunities for auto-parallelism and pipelining like [3, 5]. The current exploration is about leveraging Multi-Level Intermediate Representation (MLIR)[4] to include data movement information into the compilation passes.


  • Proposal: WIP
  • GitHub repo: not ready for open access
  • Report: TBD
  • Paper: TBD


  1. 15 April 2023: Define a MLIR dialect which can describe data movement of deep learning
  2. 1 June 2023: Figure out the optimization over the dialect converting and lowering passes
  3. 15 July 2023: Prototype the backend code generation for real-world tests


  1. Ivanov, Andrei, et al. “Data movement is all you need: A case study on optimizing transformers.” Proceedings of Machine Learning and Systems 3 (2021): 711-732. available at
  2. Kjolstad, Fredrik, et al. “The tensor algebra compiler.” Proceedings of the ACM on Programming Languages 1.OOPSLA (2017): 1-29. available at
  3. Yuan, Jinhui, et al. “Oneflow: Redesign the distributed deep learning framework from scratch.” arXiv preprint.
  4. Vasilache, Nicolas, et al. “Composable and modular code generation in MLIR: A structured and retargetable approach to tensor compiler construction.” arXiv preprint.
  5. Zheng, Lianmin, et al. “Alpa: Automating Inter-and {Intra-Operator} Parallelism for Distributed Deep Learning.” 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). available at

6 - NIST

NIST: Reusable Hybrid Multi-Services Data Analytics Framework

Over the last several years, the computation landscape for conducting data an- alytics has completely changed. While in the past a lot of the activities have been undertaken in isolation by companies and research institutions, today’s in- frastructure constitutes a wealth of services offered by a variety of providers that offer opportunities for reuse and interactions.

We will expand analytics services to focus on developing a frame- work for reusable hybrid multi-service data analytics. It includes (a) a technology review that explicitly targets the intersection of hybrid multi-provider analytics services (b) enhancing the concepts of services to showcase how hybrid, as well as multi-provider services, can be integrated and reused via the proposed framework, (d) address analytics service composition, and (c) integrate container technologies to achieve state-of-the-art analytics service deployment capabilities.

PI: Gregor von Laszewski,

7 - OSMI


MLCommons OSMI Benchmark

OSMI-Bench explores the optimal deployment of machine-learned surrogate (MLS) models in rotorcraft aerodynamics on high-performance computers (HPC). In this benchmark, we test three rotorcraft models for optimal deployment configurations, including, Long Short Term Memory (LSTM), Convolutional Neural Network (CNN), and Temporal Convolutional Neural Network (TCNN) models with 2M, 44M, and 212M trainable parameters respectively [1]. Surrogate models trained on synthetic data were selected because we are solely focused on inference efficiency not model accuracy. We are now running the benchmark on the Rivanna HPC at the University of Virginia to find the optimal deployment scenario for each model, and we plan to develop more models to benchmark, such as a transformer-encoder natural language model. We are also investigating the relationship between batchsize, GPU, and concurrency and inference throughput/time. We will soon explore running the load balancers used in the OSMI-Bench framework, such as Python concurrent.futures threadpool, HAProxy and mpi4py, on Rivanna.



  • 31 March 2023: Graph the relationship between configuration and inference performance for each model, complete OSMI-Bench documentation for Rivanna
  • 15 April: Run load balancer on Rivanna
  • 1 May 2023: Develop and benchmark new models


  1. Wes Brewer et al. “Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC”, 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), 15 November 2021, 10.1109/MLHPC54614.2021.00008,
  2. Wes Brewer, “OSMI-Bench”,


Nate Kimball, Wes Brewer, Gregor von Laszewski (, Geoffrey C. Fox

8 - Timeseries


Deep Learning Predictions for Hydrology Data

Here is the our research abstract: This research strives to approve that one can study several sets of sequences or time series of an underlying evolution operator using a deep learning network. The language of geospatial time series is used as a common application type, whereas the series can be any sequence, and the sequences can be in any collection (bag)-not just Euclidean space-time-as we need sequences labeled in some way, and having properties consequent of this label (position in abstract space). This problem has been successfully tackled by deep learning in many ways and within numerous research fields. The most advanced work is estimated to be Natural Language processing and transportation (ride-hailing). The second case, with traffic and the number of people needing rides, is a geospatial problem with significant constraints from the spatial locality. As in many problems, the data here is typically space-time-stamped events. However, these can be converted into spatial time series by binning in space and time. Comparing deep learning for such time series with coupled ordinary differential equations used to describe multi-particle systems motivates the introduction of an evolution operator that describes the time dependence of complex systems. With an appropriate training process, our research interprets deep learning applied to spatial time series as a particular approach to finding the time evolution operator for the complex system giving rise to the spatial time series. Whimsically we view this training process as determining hidden variables that represent the theory (as in Newton’s laws) of the complex system. This problem is formulated in general and presents an open-source package FFFFWNPF as a Jupyter notebook for training and inference using either recurrent neural networks or a variant of the transformer (multi-headed attention) approach. This assumes an outside data engineering step that can prepare data to ingest into FFFFWNPF. The approach, a comparison of transformer and LSTM networks are presented for time series of Hydrology streamflow, temperature, and precipitation data collected on 671 catchments from each nation: the US, the UK, and Chile. This research is intended to explore how complex systems of different types (different membership linkages) are described by different types of deep learning operators. Geometric structure in space and multi-scale behavior in both time and space will be important. We predict that the current forecasting formulation will be easily extended to sequence-to-sequence problems.



  1. GitHub repo: TBD
  2. Paper: TBD


  1. Newman, A. & Clark, M. & Sampson, Kevin & Wood, A. & Hay, Lauren & Bock, Andy & Viger, Roland & Blodgett, David & Brekke, Levi & Arnold, Jeffrey & Hopson, Thomas & Duan, Qingyun. (2015). Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: Data set characteristics and assessment of regional variability in hydrologic model performance. Hydrology and Earth System Sciences. 19. 209-223. 10.5194/hess-19-209-2015.
  2. Alvarez-Garreton, Camila & Mendoza, Pablo & Boisier, Juan P. & Addor, Nans & Galleguillos, Mauricio & Zambrano-Bigiarini, Mauricio & Lara, Antonio & Puelma, Cristobal & Cortés, Gonzalo & Garreaud, Rene & Mcphee, James & Ayala, Álvaro. (2018). The CAMELS-CL dataset: Catchment attributes and meteorology for large sample studies-Chile dataset. Hydrology and Earth System Sciences. 22. 5817-5846. 10.5194/hess-22-5817-2018.
  3. Coxon, G. & Addor, Nans & Bloomfield, John & Freer, Jim & Fry, Matt & Hannaford, Jamie & Howden, Nicholas & Lane, Rosanna & Lewis, Melinda & Robinson, Emma & Wagener, Thorsten & Woods, Ross. (2020). CAMELS-GB: hydrometeorological time series and landscape attributes for 671 catchments in Great Britain. Earth System Science Data. 12. 2459-2483. 10.5194/essd-12-2459-2020.
  4. Sit, Muhammed & Demiray, Bekir & Xiang, Zhongrun & Ewing, Gregory & Sermet, Yusuf & Demir, Ibrahim. (2020). A comprehensive review of deep learning applications in hydrology and water resources. Water Science and Technology. 82. 10.2166/wst.2020.369.
  5. Fox, Geoffrey & Rundle, John & Donnellan, Andrea & Feng, Bo. (2021). Earthquake Nowcasting with Deep Learning. 10.13140/RG.2.2.27588.14722.
  6. Vaswani, Ashish & Shazeer, Noam & Parmar, Niki & Uszkoreit, Jakob & Jones, Llion & Gomez, Aidan & Kaiser, Lukasz & Polosukhin, Illia. (2017). Attention Is All You Need.
  7. Huang, Xinyuan & Fox, Geoffrey & Serebryakov, Sergey & Mohan, Ankur & Morkisz, Pawel & Dutta, Debojyoti. (2019). Benchmarking Deep Learning for Time Series: Challenges and Directions. 5679-5682. 10.1109/BigData47090.2019.9005496.