Community Interaction and Outreach: We intend to launch a mix of activities ranging from interactions within the deep learning community to a broad outreach program. The central deliverable is a productive Python-based interface to high-performance tools that enables a spectrum of deep learning research. The Stanford team in RaPyDLI has defined initial requirements both for functionality and interface for this system. They have also defined an initial ImageNet-based benchmark problem. We will iterate these design choices with the deep learning community during the first six months of the project. This process will also identify groups that would be interested in being early users of RaPyDLI. Our outreach will involve a simple web resource, social media (including email lists), as well as activities (e.g., birds of a feather, posters, etc.) at selected conferences. The development is open, so we will continue these interactions. The next major targeted outreach will come after the first release of RaPyDLI. Here we will work with identified early users getting feedback on all aspects of the system that will emerge as the system is used, from particular bugs to missing functionality. The community activities will change when the first release of a stable software environment with the benchmark is available and deployed on suitable machines, including XSEDE. This will coincide with papers and posters on RaPyDLI. The initial release will come with documentation and an early tutorial, including a version using pre-recorded videos and a MOOC-style presentation. This approach has been used for software systems and online data science classes at IU as well as for FutureGrid. It allows many courses to easily include use of this software with a quality shared description of its proper handling. All three partners will use RaPyDLI in courses in computer/data science. We will use this early experience with users to firm up both the software and its associated support so that we can offer tutorials at both cyber-infrastructure and machine learning conferences, an activity involving all three partners. Candidates include SC, HPDC, CCGrid, and eScience for Cyberinfrastructure/HPC and ICLR, ICML, and NIPS for Deep Learning. Note that this includes dedicated tutorials on RaPyDLI, inclusion of materials in more general tutorials (such as Stanford’s existing deep learning tutorials), and tutorials of the techniques employed by UTK in the design and optimization of the RaPyDLI library. Again, feedback from these events will be folded back into design and implementation of RaPyDLI. We plan workshops in Years 1 and 3 to correspond to gathering requirements and refining/explaining the initial release with about 20 funded participants at each.

Contributions to Research Infrastructure: In software packages such the BLAS, LAPACK, ScaLAPACK, PAPI, and ATLAS, the UTK team in particular has proved to be especially adept at designing software infrastructure that is not only widely adopted and used by the research community, but also facilitates and supports ongoing community contributions. Building on that work, we will encourage successful users of RaPyDLI to contribute their software (and exemplar datasets) to a repository that we will set up. As the project matures we will consider moving it to Apache. In addition to leveraging UTK’s long experience in building durable software infrastructure, working with the Apache community will provide a well-established approach to sustainable software, allowing RaPyDLI to be related to both the infrastructure (Yarn, Hbase, Hadoop, Spark) and libraries (Mahout, MLlib, MLbase) of the Apache Big Data stack. Separately, we are working on integrating HPC technologies with Apache so that high-performance environments like RaPyDLI can be integrated.

Engagement with Underserved Communities: Deep learning has broad appeal and we intend to include RaPyDLI work in the broader impact programs of the partner universities. At Indiana University we will tap into the undergraduate research programs organized by the School of Informatics and Computing. We will also use our excellent relations with Minority Serving Institution (MSI) national organizations supporting American Indian (AIHEC), African American (NAFEO), and Hispanic (HACU) colleges in general and particular links to institutions including ECSU (Elizabeth City State), JSU (Jackson State) and ADMI (Association of Computer/Information Sciences and Engineering Departments at Minority Institutions). These relations have been established through particular projects and by a broad MSI-CIEC (Minority Serving Institutions Cyberinfrastructure Empowerment Coalition) initiative co-founded by Dr. Geoffrey Fox, which gives us access to a broad range of undergraduates with a strong focus on MSI’s.

RaPyDLI Collaborators