High Performance Computing Enhanced Apache Big Data Stack

NIST BIG DATA PUBLIC WORKING GROUP BIG DATA USE CASE SURVEY

Publications:

  1. Supun Kamburugamuve, Kannan Govindarajan, Pulasthi Wickramasinghe, Vibhatha Abeykoon, Geoffrey Fox, "Twister2: Design of a Big Data Toolkit" in EXAMPI 2017 workshop November 12 2017 at SC17 conference, Denver CO 2017.
  2. Kannan Govindarajan, Supun Kamburugamuve, Pulasthi Wickramasinghe, Vibhatha Abeykoon, Geoffrey Fox, "Task Scheduling in Big Data - Review, Research: Challenges, and Prospects" Technical Report October 31 2017.
  3. Langshi Chen, Bo Peng, Zhao Zhao, Saliya Ekanayake, Anil Vullikanti, Madhav Marathe, Shaojuan Zhu, Emily Mccallum, Lisa Smith, Lei Jiang, Judy Qiu, "A New Pipelined Adaptive-Group Communication for Large-Scale Subgraph Counting", Technical Report October 22 2017.
  4. Indiana University (Fox, Qiu, Crandall, von Laszewski, Rutgers (Jha), Virginia Tech (Marathe), Kansas (Paden), Stony Brook (Wang), Arizona State (Beckstein), Utah (Cheatham), "Summary of NSF 1443054: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science", September 23 2017.
  5. Supun Kamburugamuve, Karthik Ramasamy, Martin Swany, Geoffrey Fox, "Low latency stream processing: Apache Heron with Infiniband & Intel Omni-Path", technical report September 2017. To be presented at UCC conference at Austin Texas December 5-8, 2017.
  6. Zhao Zhao, Langshi Chen, Mihai Avram, Meng Li, Guanying Wang, Ali Butt, Maleq Khan, Madhav Marathe, Judy Qiu, Anil Vullikanti, "Finding and counting tree-like subgraphs using MapReduce", technical report, August 2017.
  7. Geoffrey C. Fox, Vatche Ishakian, Vinod Muthusamy, Aleksander Slominski, "Status of Serverless Computing and Function-as-a-Service(FaaS) in Industry and Research", Report from workshop and panel at the First International Workshop on Serverless Computing (WoSC) Atlanta, June 5 2017 DOI arXiv:1708.08028.
  8. Supun Kamburugamuve and Geoffrey Fox, "Designing Twister2: Efficient Programming Environment Toolkit for Big Data" Digital Science Center, technical report August 6 2017 DOI.
  9. L. Chen, J. Qiu. "Development of Harp-DAAL Interface" Technical Report (December 2016).
  10. L. Chen, J. Qiu. "Harp-DAAL: A High Performance Data-Intensive Machine Learning Framework" Technical Report (March 2017)
  11. B. Peng, B. Zhang, L. Chen, M. Avram, R. Henschel, C. Stewart, S. Zhu, E. Mccallum, L. Smith, T. Zahniser, J. Omer, J. Qiu. "HarpLDA+: Optimizing Latent Dirichlet Allocation for Parallel Efficiency" Technical Report (August 2017) to be published in IEEE Big Data conference December 11-14 2017, Boston, MA.
  12. Supun Kamburugamuve, Karthik Ramasamy, Martin Swany, Geoffrey Fox, "Low Latency Stream Processing: Twitter Heron with Infiniband and Omni-Path", Proceedings of Strata Data Conference New York NY September 26-28, 2017
  13. Geoffrey C. Fox and Shantenu Jha, "A Tale of Two Convergences: Applications and Computing Platforms", Extended abstract for 2017 New York Scientific Data Summit (NYSDS'17) Data-Driven Discovery in Science and Industry, August 6-9, 2017.
  14. Geoffrey C. Fox, Devarshi Ghoshal, Shantenu Jha, Andre Luckow, Lavanya Ramakrishnan, "Streaming Computational Science: Applications, Technology and Resource Management for HPC" Extended abstract for 2017 New York Scientific Data Summit (NYSDS'17) Data-Driven Discovery in Science and Industry, August 6-9, 2017.
  15. Geoffrey C. Fox, Shantenu Jha, "Conceptualizing A Computing Platform for Science Beyond 2020: To Cloudify HPC, or HPCify Clouds?", Proceedings of IEEE Cloud 2017 Conference June 25-30 2017, Honolulu, Hawaii.
  16. Hyungro Lee and Geoffrey C. Fox, "Efficient Software Defined Systems using Common Core Components", Proceedings of IEEE Cloud 2017 Conference June 25-30 2017, Honolulu, Hawaii.
  17. Langshi Chen, Bo Peng, Bingjing Zhang, Tony Liu, Yiming Zou, Lei Jiang, Robert Henschel, Craig Stewart, Zhang Zhang, Emily Mccallum, Tom Zahniser, Jon Omer, Judy Qiu, "Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters", Proceedings of IEEE Cloud 2017 Conference June 25-30 2017, Honolulu, Hawaii.
  18. Hyungro Lee, "Software Defined Systems with DevOps Tools and Infrastructure Provisioning", Ph.D. Forum at IPDPS conference Orlando FL May 30-June 2, 2017.
  19. Supun Kamburugamuve, "Designing Efficient Programming Environment Toolkits for Big Data Applications: Integrating Parallel and Distributed Computing Runtimes", Ph.D. Thesis Proposal Indiana University July 6 2017.
  20. Bingjing Zhang "HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK", Indiana University Ph.D. Thesis May 2017.
  21. Bingjing Zhang, Bo Peng, Judy Qiu, "Parallelizing Big Data Machine Learning Applications with Model Rotation", in the book series on Advances in Parallel Computing published by IOS Press, 2017.
  22. Fox, Geoffrey; Jha, Shantenu; Ramakrishnan, Lavanya; "STREAM2016: Streaming Requirements, Experience, Applications and Middleware Workshop", Final Report of Workshop March 22-23 2016 at Washington DC, LBNL--1006355 October 1, 2016.
  23. Hyungro Lee, "Building Software Defined Systems on HPC and Clouds", PhD Proposal Indiana University February 19 2017.
  24. Geoffrey Fox, David Crandall, Judy Qiu, Gregor Von Laszewski, Shantenu Jha, John Paden, Oliver Beckstein, Tom Cheatham, Madhav Marathe, Fusheng Wang, "Tutorial Program", BigDat 2017 MIDAS and SPIDAL Tutorial Bari Italy February 13-14 2017 DOI.
  25. Badi' Abdul-Wahid, Hyungro Lee, Gregor von Laszewski, and Geoffrey Fox, "Scripting Deployment of NIST Use Cases" Technical Report January 20 2017 DOI.
  26. Indiana University (Fox, Qiu, Crandall, von Laszewski, Rutgers (Jha), Virginia Tech (Marathe), Kansas (Paden), Stony Brook (Wang), Arizona State (Beckstein), Utah (Cheatham) "Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science NSF14-43054 Progress Report" July 2016 21 month Project Report.
  27. Supun Kamburugamuve, Pulasthi Wickramasinghe, Saliya Ekanayake, Chathuri Wimalasena, Milinda Pathirage, Geoffrey Fox, "TSmap3D: Browser Visualization of High Dimensional Time Series Data", Technical report May 10 2016. Published in Advances in High Dimensional Big Data (2nd Workshop) in 2016 IEEE International on Big Data, December 5 - December 8, 2016 - Washington DC, USA with this version.
  28. Rick Wagner, Philip Papadopoulos, Dmitry Mishin, Trevor Cooper, Mahidhar Tatineti (San Diego Supercomputer Center), Gregor von Laszewski, Fugang Wang, Geoffrey C. Fox, "User Managed Virtual Clusters in Comet", Proceedings of XSEDE All Hands Meeting, Miami July 17-21, 2016. ACM Digital Library DOI 10.1145/2949550.2949555.
  29. Bingjing Zhang, Peng Bo, Judy Qiu, "Model Data-Centric Computation Abstractions in Machine Learning Applications", in 3rd Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR2016), held in conjunction with SIGMOD/PODS2016, July 1, 2016.
  30. Bingjing Zhang, "A Collective Communication Layer for the Software Stack of Big Data Analytics", Doctoral Symposium. Proceedings of IEEE International Conference on Cloud Engineering (IC2E2016) Conference, April 4-8, 2016, Berlin, Germany.
  31. Davis CA, Ciampaglia GL, Aiello LM, Chung K, Conover MD, Ferrara E, Flammini A, Fox GC, Gao X, Gonçalves B, Grabowicz PA, Hong K, Hui P, McCaulay S, McKelvey K, Meiss MR, Patil S, Peli Kankanamalage C, Pentchev V, Qiu J, Ratkiewicz J, Rudnick A, Serrette B, Shiralkar P, Varol O, Weng L, Wu T, Younge AJ, Menczer F.(2016) "OSoMe: the IUNI observatory on social media". PeerJ Computer Science 2:e87 DOI for paper and preprint
  32. Supun Kamburugamuve, Pulasthi Wickramasinghe, Saliya Ekanayake, Geoffrey C. Fox, "Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink", The International Journal of High Performance Computing Applications, July 2, 2017 DOI.
  33. Project: Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science SPIDAL NSF14-43054
  34. Saliya Ekanayake, Supun Kamburugamuve, Pulasthi Wickramasinghe, Geoffrey Charles Fox, "Java Thread and Process Performance for Parallel Machine Learning on Multicore HPC Clusters", Technical Report August 8 2016, DOI. Published in 2016 IEEE International Conference on Big Data, December 5 - December 8, 2016 - Washington DC, USA with this version.
  35. Bingjing Zhang, Peng Bo, Judy Qiu, Model Data-Centric Computation Abstractions in Machine Learning Applications, in 3rd Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR2016), held in conjunction with SIGMOD/PODS2016, July 1, 2016.
  36. Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake, Supun Kamburugamuve, White Paper: Big Data, Simulations and HPC Convergence, Technical Report May 20 2016 DOI. Presented at BDEC Frankfurt workshop June 16 2016.
  37. Bingjing Zhang, Peng Bo, Judy Qiu, High Performance LDA through Collective Model Communication Optimization, Proceedings of International Conference on Computational Science (ICCS2016) conference, June 6-8, 2016, San Diego, California.
  38. Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake, and Supun Kamburugamuve, Big Data, Simulations and HPC Convergence Technical Report January 30 2016. DOI: 10.13140/RG.2.1.1858.8566 for WBDB 2015 Seventh Workshop on Big Data Benchmarking, New Delhi India, December 14, 2015. Published in Springer Lecture Notes in Computer Science LNCS 10044 DOI.
  39. Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage, Geoffrey Fox, Towards High Performance Processing of Streaming Data in Large Data Centers Technical Report January 26 2016. Published in proceedings of HPBDC 2016 IEEE International Workshop on High-Performance Big Data Computing in conjunction with The 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2016), Chicago Hyatt Regency, Chicago, Illinois USA, Friday, May 27th, 2016.
  40. Saliya Ekanayake, Supun Kamburugamuve and Geoffrey Fox, SPIDAL: High Performance Data Analytics with Java and MPI on Large Multicore HPC Clusters, Technical Report January 5 2016; Proceedings of 24th High Performance Computing Symposium (HPC 2016), April 3-6, 2016, Pasadena, CA, USA as part of the SCS Spring Simulation Multi-Conference (SpringSim'16).
  41. Geoffrey Fox, Shantenu Jha, Lavanya Ramakrishnan, "STREAM2015 Final Report" of Workshop Indianapolis October 27-28, 2015. DOI
  42. Geoffrey Fox, Judy Qiu, Shantenu Jha, Supun Kamburugamuve, and Andre Luckow, Implications of the HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack for workflows, in White paper for DoE NGNS/CS Scientific Workflows Workshop http://extremescaleresearch.labworks.org/. April 20-21 2015. Rockville Md http://dsc.soic.indiana.edu/publications/WorkflowsandHPC-ABDS.pdf.
  43. Geoffrey Fox, Judy Qiu, Shantenu Jha, Supun Kamburugamuve and Andre Luckow, HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack Invited talk at 2nd International Workshop on Scalable Computing For Real-Time Big Data Applications (SCRAMBL'15) at CCGrid2015, the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, held in Shenzhen, Guangdong, China http://dsc.soic.indiana.edu/publications/HPC-ABDSDescribed_final.pdf
  44. Bingjing Zhang, Yang Ruan, and Judy Qiu, Harp: Collective Communication on Hadoop in IEEE International Conference on Cloud Engineering (IC2E). March 9-12, 2015. Tempe AZ. http://dsc.soic.indiana.edu/publications/HarpQiuZhang.pdf.
  45. Shantenu Jha, Andre Luckow, Pradeep Mantha, A Valid Abstraction for Data-Intensive Applications on HPC, Hadoop and Cloud Infrastructures? 2015. [Online]. Available: http://arxiv.org/abs/1501.05041
  46. Dan Reed and Jack Dongarra. Exascale Computing and Big Data: The Next Frontier. 2014 [accessed 2015 March 8]; Available from: http://www.netlib.org/utk/people/JackDongarra/PAPERS/Exascale-Reed-Dongarra.pdf.
  47. Geoffrey C. FOX, Shantenu JHA, Judy QIU, Saliya EKANAYAKE, and Andre LUCKOW, Towards a Comprehensive Set of Big Data Benchmarks. February 15, 2015. http://dsc.soic.indiana.edu/publications/OgreFacetsv9.pdf.
  48. Geoffrey C. Fox, Shantenu Jha, Judy Qiu, and Andre Luckow, Ogres: A Systematic Approach to Big Data Benchmarks, in Big Data and Extreme-scale Computing (BDEC) January 29-30, 2015. Barcelona. http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/OgreFacets.pdf.
  49. NIST Big Data Use Case & Requirements. 2013 [accessed 2015 March 1]; Available from: http://bigdatawg.nist.gov/V1_output_docs.php.
  50. Geoffrey Fox and Wo Chang, Big Data Use Cases and Requirements, in 1st Big Data Interoperability Framework Workshop: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data March 18 - 21, 2014. San Diego Supercomputer Center, San Diego. http://dsc.soic.indiana.edu/publications/NISTUseCase.pdf.
  51. Geoffrey C.Fox, Shantenu Jha, Judy Qiu, and Andre Luckow, Towards an Understanding of Facets and Exemplars of Big Data Applications, in 20 Years of Beowulf: Workshop to Honor Thomas Sterling's 65th Birthday October 14, 2014. Annapolis http://dsc.soic.indiana.edu/publications/OgrePaperv9.pdf
  52. Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha, and Geoffrey C. Fox, A Tale of Two Data-Intensive Approaches: Applications, Architectures and Infrastructure, in 3rd International IEEE Congress on Big Data Application and Experience Track. June 27- July 2, 2014. Anchorage, Alaska. http://arxiv.org/abs/1403.1528.
  53. Geoffrey Fox, Judy Qiu, and Shantenu Jha, High Performance High Functionality Big Data Software Stack, in Big Data and Extreme-scale Computing (BDEC). 2014. Fukuoka, Japan. http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/fox.pdf.
  54. Judy Qiu, Shantenu Jha, Andre Luckow, and Geoffrey C.Fox, Towards HPC-ABDS: An Initial High-Performance Big Data Stack, in Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data. March 18-21, 2014. San Diego Supercomputer Center, San Diego. http://dsc.soic.indiana.edu/publications/nist-hpc-abds.pdf.

Theses:

Kaleidoscope diagram