A Dataflow*basedSoftware Integration Model in Parallel and Distributed Computing and Applications by Gang Cheng B*S** Huazhong University of Science and Technology* ***** China M*S** Huazhong University of Science and Technology* ***** China Abstract Submitted in partial ful*llment of the requirements for the degree of Doctor of Philosophy in School of Computer Information and Science in the Graduate School of Syracuse University April* **** Approved Professor Geo*rey C* Fox Date c *Copyright **** Gang Cheng A Dataflow*basedSoftware Integration Model in Parallel and Distributed Computing and Applications by Gang Cheng Abstract of Dissertation As an important component in parallel software engineering*software integration plays a indispensable role in building large*scalemulti*discinplary parallel and dis tributed applications on high performance computing systems*To make parallel computing systems truely useable*at least as useable as the conventional comput ers today*methodologies and techniques in software engineering*which are proven useful and critical in the life cycle of a software development*should be brought into the process of parallel software development and system integration on HPCC environment* In this study*we propose a software integration framework which is based on data*ow programming paradigm and execution model for general purpose HPDC ap plications*Advantages of this data*ow*basedmodel are investigated and illustrated* as well as system issues in applying the integration framework into a set of HPDC application areas*We exame applications of this model in several speci*cyet typical areas which have general needs for software integration and are best suited to this data*ow model*Areas examined include integration of multiple parallel program ming paradigms on a MIMD parallel system*integration of interactive data visual ization and parallel computations on a hetergeneous distributed high performance environment*integration of networking navigation and parallel relational database management system *RDBMS** and integration of World*Wide*W eb computing and database processing*Drawn upon software and system integration issues in those areas*we present the general requirement for a software integration framework in a HPDC application* By examining and implementing several serious real*world applications which are built on the proposed framework*we conduct extensive case studies which reveal issues in software integration on HPDC systems and demonstrate the e*ectiveness of this data*ow*basedmodel* By viewing di*erent parallel programming paradigms as essentially heterogeneous approach in mapping *real*world*problems to parallel systems*we discuss method ologies in integrating multiple programming models on an massively parallel system such as Connection Machine CM** Using a data*ow based integration model built in a visualization software AVS*we describe a simple*e*ective and modular way to couple sequential*data*paralleland explicit message*passing modules into an inte grated parallel programming environment on a CM** Two applications in computa tional electromagnetics and earth science repectively are studied and a protype system is built and carefully evaluated to demonstrate the integration of data*paralleland message*passingmodules in the proposed multi*paradigm programming environment* Network*basedconcurrent computing and interactive data visualization are two important components in industry applications of high performance computing and communication*In the study of integrating interactive data visualization and par allel processing*based on the proposed integration framework*we developed a set of interactive remote visualization systems on heterogeneous parallel and distributed computers in the areas of *nancialmodeling*computational electromagnetics and computational chemestry*Using the data*ow model of the AVS testbed in three case studies*we demonstrate a simple*e*ective and modular approach in the proposed framework to couple parallel simulation modules into an interactive remote visual ization environment*In some of the application systems we have built*we describe the whole software development cycle in details*including parallel algorithm design and optimization* performance analysis* user interface design and system*software integration into an integrated computing and visualization environment* For the integration of parallel relational database server and a Web*basednetwork ing navigation in a client*server model*we build a powerful web*based collaborative information system that can facilitate and archive Internet email*oriented collabo ration in the forms of USENET newsgroup articles*mailing list*personal mailbox emails*chat and bulletin board messages*and has a hyper*textbased search inter face that can access to a variety of search capabilities built on the structured and non*structuredattributes of email information*including search by keyword*subject* sender*date*thread*URL etc*The web*basedsystem enables global networking navigations in the information space on the Internet and a uniform integrated search GUI from any platform*It can change the traditional way people used to conduct email*basedcollaboration*such as using news*chat client and server*subscribing to an mailing list*and searching*archiving emails in mailbox*We describe the general architecture*system integration issues and our development e*ortin building such systems* Acknowledgements I am very grateful for Prof*Geo*reyFox*sadvice and continous support and encour agement to my work**moreto add** iv Contents Acknowledgements iv List of Tables viii List of Figures ix *Introduction * *** Meta Problems and Meta Computers ******************* *** Software Integration ******************************* System and Software Integration in the Mapping Between Meta Prob lem and Meta Computer ************************** *** General Requirements in a Large*scaleHPCC Application ********A General Sofware Integration Model for Parallel and Distributed Computing and Applications ** ***Introduction to Data Flow * * * * * * * * * * * * * * * * * * * * * * * * ** ***** DataFlow model as a general programming paradigm * * * * * ** ***** DataFlow model as a restricted message*passing programming method ************************ * * * * * *** ***** Data*o w as the overall programming*execution model* framework ** ***A General Model Based on Data*o w Framework * * * * * * * * * * * * ** *** Application Visualization System - A Testbed of Software Integration for HPDC Appliactions ************** * * * * * * * * * * * *** v *Integrating Multiple Parallel Programming Paradigms ** ***Introduction * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ***Multi*paradigm Parallel Programming Environments * * * * * * * * * ** ***** Multi*paradigmProgramming in Mapping Problem Domain to Parallel System **************************** ***** Multi*paradigm Programming * * * * * * * * * * * * * * * * * * ** ***** Parallel Programming on Connection Machine CM* * * * * * * ** ***AVS - A Data*o w Based Integration Tool for Multi*paradigm Pro gramming on CM******************************* ***** The Application Visualization System * * * * * * * * * * * * * * ** ***** A General Parallel Programming Environment on the CM* with AVS ******************************* ***** Limitations in the AVS Integration Model * * * * * * * * * * * ** ***A Case Study * Comparison of Numerical Advection Models * * * * ** *** Conclusion *********************************** * Integrating Data Visulizaton and Parallel Computations ** ***Introduction * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** ***A Distributed Visualization and Computing Model in AVS * * * * * * ** ***** Application Visualization System ****************** ***** A Data*o w*based Visualization and Computing Model * * * * ** ***A Financial Modeling Application * * * * * * * * * * * * * * * * * * * * ** ***** Introduction and Problem Description *************** ***** System Con*guration and Integration * * * * * * * * * * * * * * ** ***** Performance Analysis * * * * * * * * * * * * * * * * * * * * * * * ** ***A Computational Electromagnetic Application * * * * * * * * * * * * * ** ***** Introduction and Problem Description *************** ***** System Con*guration and Integration * * * * * * * * * * * * * * ** ***** Performance Analysis * * * * * * * * * * * * * * * * * * * * * * * ** ***A Computational Chemistry Application * * * * * * * * * * * * * * * * ** vi ***** Introduction and Problem Description *************** ***** Parallelization of MOPAC * * * * * * * * * * * * * * * * * * * * ** ***** System Con*guration and Integration * * * * * * * * * * * * * * ** ***** Performance Analysis * * * * * * * * * * * * * * * * * * * * * * * ** ***Conclusion * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** *Integrating Web*basednetworking navigation and relational database management system in a client*server model ** * Integrating distributed computing and web servers in a data**o w *client*server*model ** * Prototype systems and demonstrations ** Bibliography ** vii List of Tables *** Experimental timings for the option price modeling *inseconds****** ***Timings of calculations and communications for the EMS *in second* ** *** IBM SP*timing of kernel subroutines *insecond************** ***The speed*up of the test set on Sun* cluster * * * * * * * * * * * * * * **(f80)op(21:aload)0x18bf4Operand stack at 0xca5d0: 0xd29b0: 0x04 arry --Lwrx--- 0x0320 0x001862c8 Execution stack at 0xca710: 0xd4318: 0x0f oper --F---e-- 0x0000 0x0000dc88 = %interp_exit 0xd4320: 0x03 file --F-rxe-- 0x0001 0xeffff018 0xd4328: 0x05 mpry --G-rxe-- 0x0006 0x00109650 0xd4330: 0x0e null --F---e-- 0x0003 0x0001a598 0xd4338: 0x01 bool --F------ 0x0000 0x00000000 = false 0xd4340: 0x03 file --L-rxe-- 0x0002 0x000dbba8 0xd4348: 0x05 mpry --G-rxe-- 0x0015 0x00109952 0xd4350: 0x06 spry --G-rxe-- 0x0001 0x0010987e 0xd4358: 0x05 mpry --G-rxe-- 0x0039 0x001098fe