I will draw some sketches as below. I hope you can understand them. 1. Model for computation and I/O parallel compute node (upper applications/client) _____ ______ ______ _______ |cn1 | | cn2 | | cn3 | | cn4 | |____| |_____| |_____| |_____| \ \ / / \ \ / / ________________________________________________________ | my I/O base (consists of i/o nodes) | |_______________________________________________________| / / \ \ / / \ \ ______ ________ _______ _______ |disk1| | disk2 | | disk3| |disk4 | |_____| |_______| |______| |______| (Note: I just enumerate four compute nodes or disks arbitrarily ) 2. Mapping global matrix data structures for scattered disks. (1) Assume that I need to maintain a matrix A(4x4) as below. In fact, it can be matrix A(mxn) A(4x4) = [ a11 a12 a13 a14 ] [ a21 a22 a23 a24 ] [ a31 a32 a33 a34 ] [ a41 a42 a43 a44 ] (2) Appearance of logic file (linear -- row-major for example) __________________________________________________________________ | a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 | |__________________________________________________________________| (3) Normal method of partitioning the above global data structures between local disks. disk1 disk2 disk3 disk4 _________________ _________________ _________________ _________________ | a11 a12 a13 a14 | | a21 a22 a23 a24 | | a31 a32 a33 a34 | | a41 a42 a43 a44 | |_________________| |_________________| |_________________| |_________________| (4) Potential disadvantages of above normal method (partition according to row-major/column-major) For instance, compute node (application) require the matrix factors in column, while the data are stored in row-major among scattered disks. compute node ____ _____ _____ ____ _| a11| | a12 | | a13 | |a14 | | |____| |_____| |_____| |____| | | a21| ----- | a22 | | a23 | |a24 | | |____| \ |_____| |_____| |____| | | a31| ------\---- | a32 | | a33 | |a34 | | |____| \ \ |_____| |_____| |____| | | a41| --------\---\--- | a42 | | a43 | |a44 | | |____| \ \ \|_____| |_____| |____| | \ \ \ | \ \__\________________ \ \ \_______________\ _______________________ _\________________ _\_________________ _\________________ __\______________ | a11 a12 a13 a14 | | a21 a22 a23 a24 | | a31 a32 a33 a34 | | a41 a42 a43 a44 | |__________________| |___________________| |__________________| |_________________| disks It if obvious that in order to meet clients requirements, we have to do 4x4 = 16 (times) transfering of data. Of course, the above example is a extreme bad one. However, I want to conceive a method which is eclective or proper in most circumstances. I will state my method in the next mail soon. (5) Mapping global matrix data among disks in "BLOCKS". Still, assume that I need to maintain a matrix A(4x4) as below. A(4x4) = [ a11 a12 a13 a14 ] [ a21 a22 a23 a24 ] [ a31 a32 a33 a34 ] [ a41 a42 a43 a44 ] I can divide it into blocks as follows: | [ a11 a12 | a13 a14 ] [ a21 a22 | a23 a24 ] block1 | block2 | | ____________________|__________________ ___________|_____________ | | [ a31 a32 | a33 a34 ] block3 | block4 [ a41 a42 | a43 a44 ] | | Appearance of logic file (linear -- according to "blocks") block1 block2 block3 block4 ___________________________________________________________________ | a11 a12 a21 a22 a13 a14 a23 a24 a31 a32 a41 a42 a33 a34 a43 a44 | |___________________________________________________________________| "Blocks" method of partitioning the above global data structures between local disks. disk1 disk2 disk3 disk4 _________________ _________________ _________________ _________________ | a11 a12 a21 a22 | | a13 a14 a23 a24 | | a31 a32 a41 a42 | | a33 a34 a43 a44 | |_________________| |_________________| |_________________| |_________________| For instance, compute node (application) require the matrix factors in column, while the data are stored in blocks among scattered disks. compute node ____ _____ _____ ____ | a11| | a12 | | a13 | |a14 | {|____| {|_____| |_____| |____| /{| a21| /{| a22 | | a23 | |a24 | | |____| | |_____| |_____| |____| | | a31| | | a32 | | a33 | |a34 | | |____|} | |_____| |_____| |____| | | a41|} | | a42 | | a43 | |a44 | | |____| | |_____| |_____| |____| | _____________| |________/_ \ ___/____/__\____\__ ___________________ __________________ _________________ | a11 a12 a21 a22| | a13 a14 a23 a24 | | a31 a32 a41 a42 | | a33 a34 a43 a44 | |__________________| |___________________| |__________________| |_________________| disks It is obvious that in order to meet clients requirements, we just have to do 4x2 = 8 (times) transfering of data. How can I give the formula prove of my thoughts to show that it is effective? I have to leave now because the lab will be closed until 2:00 pm this afternoon. Judy (2) Appearance of logic file (linear -- row-major for example) __________________________________________________________________ | a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 | |__________________________________________________________________| (3) Normal method of partitioning the above global data structures between local disks. disk1 disk2 disk3 disk4 _________________ _________________ _________________ _________________ | a11 a12 a13 a14 | | a21 a22 a23 a24 | | a31 a32 a33 a34 | | a41 a42 a43 a44 | |_________________| |_________________| |_________________| |_________________| (4) Potential disadvantages of above normal method (partition according to row-major/column-major) For instance, compute node (application) require the matrix factors in column, while the data are stored in row-major among scattered disks. compute node ____ _____ _____ ____ _| a11| | a12 | | a13 | |a14 | | |____| |_____| |_____| |____| | | a21| ----- | a22 | | a23 | |a24 | | |____| \ |_____| |_____| |____| | | a31| ------\---- | a32 | | a33 | |a34 | | |____| \ \ |_____| |_____| |____| | | a41| --------\---\--- | a42 | | a43 | |a44 | | |____| \ \ \|_____| |_____| |____| | \ \ \ | \ \__\________________ \ \ \_______________\ _______________________ _\________________ _\_________________ _\________________ __\______________ | a11 a12 a13 a14 | | a21 a22 a23 a24 | | a31 a32 a33 a34 | | a41 a42 a43 a44 | |__________________| |___________________| |__________________| |_________________| disks It is obvious that in order to meet clients requirements, we have to do 4x4 = 16 (times) transfering of data. Of course, the above example is an extremely bad one. However, I want to conceive a method which is eclective or proper in most circumstances. compute node ____ _____ _____ ____ _| a11| | a12 | | a13 | |a14 | | |____| |_____| |_____| |____| | | a21| ----- | a22 | | a23 | |a24 | | |____| \ |_____| |_____| |____| | | a31| ------\---- | a32 | | a33 | |a34 | | |____| \ \ |_____| |_____| |____| | | a41| --------\---\--- | a42 | | a43 | |a44 | | |____| \ \ \|_____| |_____| |____| | \ \ \ | \ \__\________________ \ \ \_______________\ _______________________ _\________________ _\_________________ _\________________ __\______________ | a11 a12 a13 a14 | | a21 a22 a23 a24 | | a31 a32 a33 a34 | | a41 a42 a43 a44 | |__________________| |___________________| |__________________| |_________________| disks It is obvious that in order to meet clients requirements, we have to do 4x4 = 16 (times) transfering of data. Of course, the above example is an extremely bad one. However, I want to conceive a method which is eclective or proper in most circumstances.