The structure of hpfFrontEnd HPF front end is developed inside a compiler development toolkit, which structure can be reflected by our dir structure. In src/include, only global header files are include. The word "global" means once a .h file is put in this directory, at least two executables or library files are going to use them as a include file, and explicit explanition for it's contents must be given. Now, the following thing are considered to be global: 1. Internal representations of a source program: these are defined in bif.h, ll.h, symb.h,... (We may change some filename in this catagory). These structure can be used not only for Fortran but almost any programming language. 2. Tags for the internal structure: they are defined in tags.h. (Pay attention to that only tags for Fortran is useful at present time.) 3. Structure for .dep file. An internal representation of a source program can be considered as a set of relation of certain language components, such as statement, expression and symbol. We call these relations as dependence. We use file with extention name dep to preserve the internal structure of a source program. So coresponding structure are needed, and they are defined in dep_strut.h now. 4. Function prototype for basic operations on internal nodes. This is defined in makenode.h, but this file need to be impoved. In src/lib, basic operation on internal nodes are provided. This library provides a basic interface for any parser or another type of language tool to build, restructe a internal structure for a given language. It also provides methods to exchange between the internal structure and the .dep file of a source program. In src/hpf2dep, a parser for converting HPF source program to our internal representation is provided. In src/tools, several utility tools are provided. Among them the following two is important. 1. dumpdep: it print out every node and its coresponding field in a .dep file as a liner list. This is the best way to display the internal structure of a given file, through it may be a little difficult to read those contents in the list. 2. unparse: it is designed to unparse internal structure for different kind of language back to source code. Right now, only Fortran and C is supported. (C unparser does exist, but we have no .dep file for C language at present time.) The intermediate representation Generally speaking, a compiler of programming language can be devided into two parts: the frontend and the backend. During the compilation, the front end will generate The intermediate representation according to the parsed source codes. This records all the necessary information in the source program. So it can be used as an input to the backend for further processing to generate machine codes. Our HPF frontend is a compiler frontend for High Performance Fortran language, it can generate the intermediate representation for Fortran program defined in HPF 1.0 specification. The intermediate representation is sets of different kinds of nodes (defined in C structure) corresponding to source program. For example, the following kinds of node are important: * file_obj: to represent the pared source file. * bif_node: to represent statement in a source program. * low_level_node: to represent expression in a statement. * symb_node: to represent a symble used in a source program. Each kind of node has different fields defined as C structure entries. These entries can be devided into two kinds: features and links. The feature feild records the attributes of one specific node, such as id (a unique number to identify same kind of node), variant (labeled with predifined tags). The link feild records the relations between two nodes. (So usually these feilds will be filled by a C pointer.) And there are two kinds of links: list link and tree link. The list links link the same kind of nodes together. While the tree links record grammer information of the parsed source program. Therefor, what a frontend do during parse time can be described as this: * reduce a grammer component from the input source file, * make or find the corresponding nodes for this components, and * fill the tree link feilds concerned with this node. * record these nodes to disk. During the second step, if the node need to be created (new memory needed), the new node will be linked to the list link. Also a new id number is provided. In the third step, tree links are filled according to the node definition of a specific language, in our case, HPF. The principel to do this is following (We will discuss the node definition of HPF later): 1. Each bif node can be linked up to two bif series (linked by blob_node) as control son. This reflects the control dependency in a source program. 2. Each bif node can be linked up to three low_level nodes to represents the expressions in this statement. 3. Each bif node can be linked only one symbol node. 4. Each low_level node can be linked to two low_level nodes, (this allow the recursive definition of expressions) and one symbol node. In the fourth step, all kinds of nodes will be stored on to disk. Same nodes will be stored in the order of list links. A .dep file is created. In fact, the more important things in the file are links, the relations among all kinds of nodes. We call them dependence. (Control dependence and data dependence in program analysis are only two specific kind of dependence in our definition.) Putting HPF frontend in a compiler testbed The intermediate representation described previousely, in fact, can be used to express most imperative style languages. In our case, the language in consideration is High Performance Fortran. So a maping between the intermediate representation and each HPF statements is defined. Deferent statements are labeled with different tags. And a parser was implemented to convert HPF source program to the intermediate representation, the so called frontend. Besides this, two important tools are implemented as accessaries. One is dumpdep and the other is unparse. dumpdep will travel each kind of nodes along the list link in them, and dump each of their feilds on to standout. When feature feild is encontered, some integer value is converted to character strings, such as tags. When link feild is encontered, address pointer is changed by node id number. unparse will travel along the tree link in root-first order, and print out certain characters according to definitions of each node's tag. Be more specific, each tag is assigned with a string coreponding to their definition. During unparse, this string will be interpereted by unparser to see what action should be taken. This is so called table-driven method. /**** Please note, so far, what we say is more general than a specific HPF frontend. In the following, we talk more about node definition of HPF, some thing may be not quite suitable in PKU's version. Because I am not sure how you change hpf2dep. ****/ Node definition of HPF Node definition of a language is a one to one mapping between language construction and intermediate representation. Since our intermediate representation allow mapping from different kind of languages. When we deal with a specific language, certain ambiguity exists. For example, a bif node can be linked to three low level node. When a statement with only two expressions, like assign statement, is mapped to the bif node, one of low level node must be empty. But it can be any one of three of them. In another words, the intermediate representation allow different candidates for a same language structure. When this happen, certain decission has to be made and specify out which low level node is going to be null in this case. Another ambiguity is that different kinds of language structures are mapped to one kind of intermediate representations. As we have said, tags are used to label different bif nodes and low level nodes, and unparse will print out characters according to the tag of a node. So we have to make sure, the same tag can be treated differently when they are used in the above situations. For example, DDOT is a low level tag, when it is used inside FORALL statement, it will use ":" to seperate its sub item. Otherwise, it will use "," to do so. Bear the above two things in mind, one can give out the definition of different languages. But even for the same language, there will be different kind of definition. Our definition for HPF is only one of them. /**** Some definition is too difficult for the unparse program, for example, the following is used to be the def for align statement. ------------------------------ The keywords ALIGN,DISTRIBUTE,DYNAMIC,INHERIT play roles of attributes. So we add four new variants which are ALIGN_OP,DISTRIBUTE_OP,DYNAMIC_OP, and INHERIT_OP. The corresponging ll_nodes belong to CLASS SgAttributeExp. Although we think ALIGN_ATTR may be a better name than ALIGN_OP, we use ALIGN_OP according to the style of Sage++. for ALIGN_OP : SgAttributeExp (variant=ALIGN_OP) | | +----------------+ | | | | SgAlignAttrExp (null) (variant=ALIGN_OP) | | +----------------+ | | | | align_source_list align_with_clause (EXPR_LIST) (ALIGN_WITH_OP) | --------------------------------- Look at the definition of ALIGN_OP, here it is ambiguous. One's sons defined as a ALIGN_OP and a null. another ALIGN_OP sons defined as align_source_list and align_with_clause. Suppose you are going to unparse it, you are traveling a tree with "root first order", what do you do when you meet a llnd taged as ALIGN_OP? To accurately unparse the tree, you have to remember which ALIGN_OP you are meeting with, say, a stack. That is why I called it too difficult for unparse. In fact, the only thing we need to do is just define another tag to substitute the top ALIGN_OP, then the unparse work (Any future work who will travel the tree using root first order) will become much easier. Similiar kind of problem exists in 1.0. So I add four tags in tags.h: #define ALIGN_ATT 835 #define DISTRIBUTE_ATT 836 #define CBREALIGN_STMT 845 #define CBREDISTRIBUTE_STMT 846 grep fD.y in 1.20, you will find how these four tag are used. ****/