The structure of hpfFrontEnd

HPF front end is developed inside a compiler development toolkit,
which structure can be reflected by our dir structure.

In src/include, only global header files are include. The word
"global" means once a .h file is put in this directory, at least two
executables or library files are going to use them as a include file,
and explicit explanition for it's contents must be given.

Now, the following thing are considered to be global:

1. Internal representations of a source program: these are defined in
bif.h, ll.h, symb.h,... (We may change some filename in this
catagory).  These structure can be used not only for Fortran but
almost any programming language.

2. Tags for the internal structure: they are defined in tags.h. (Pay
attention to that only tags for Fortran is useful at present time.)

3. Structure for .dep file. An internal representation of a source
program can be considered as a set of relation of certain language
components, such as statement, expression and symbol. We call these
relations as dependence. We use file with extention name dep to
preserve the internal structure of a source program. So coresponding
structure are needed, and they are defined in dep_strut.h now.

4. Function prototype for basic
operations on internal nodes. This is defined in makenode.h, but this
file need to be impoved.

In src/lib, basic operation on internal nodes are provided. This
library provides a basic interface for any parser or another type of
language tool to build, restructe a internal structure for a given
language.  It also provides methods to exchange between the internal
structure and the .dep file of a source program.
In src/hpf2dep, a parser for converting HPF source program to our
internal representation is provided.

In src/tools, several utility tools are provided. Among them the
following two is important.

1. dumpdep: it print out every node and its coresponding field in a
.dep file as a liner list. This is the best way to display the
internal structure of a given file, through it may be a little
difficult to read those contents in the list.

2. unparse: it is designed to unparse internal structure for different
kind of language back to source code. Right now, only Fortran and C is
supported. (C unparser does exist, but we have no .dep file for C
language at present time.)


The intermediate representation

Generally speaking, a compiler of programming language can be devided
into two parts: the frontend and the backend. During the compilation,
the front end will generate The intermediate representation according
to the parsed source codes. This records all the necessary information
in the source program. So it can be used as an input to the backend
for further processing to generate machine codes.

Our HPF frontend is a compiler frontend for High Performance Fortran
language, it can generate the intermediate representation for Fortran
program defined in HPF 1.0 specification.

The intermediate representation is sets of different kinds of nodes
(defined in C structure) corresponding to source program. For example, 
the following kinds of node are important:

* file_obj: to represent the pared source file.
* bif_node: to represent statement in a source program.
* low_level_node: to represent expression in a statement.
* symb_node: to represent a symble used in a source program.

Each kind of node has different fields defined as C structure
entries. These entries can be devided into two kinds: features and
links.

The feature feild records the attributes of one specific node, such as
id (a unique number to identify same kind of node), variant (labeled
with predifined tags).

The link feild records the relations between two nodes. (So usually
these feilds will be filled by a C pointer.) And there are two kinds
of links: list link and tree link.

The list links link the same kind of nodes together. While the tree
links record grammer information of the parsed source program.

Therefor, what a frontend do during parse time can be described as
this: 

* reduce a grammer component from the input source file, 
* make or find the corresponding nodes for this components, and
* fill the tree link feilds concerned with this node.
* record these nodes to disk.

During the second step, if the node need to be created (new memory
needed), the new node will be linked to the list link. Also a new id
number is provided.

In the third step, tree links are filled according to the node
definition of a specific language, in our case, HPF. The principel to
do this is following (We will discuss the node definition of HPF
later):

1. Each bif node can be linked up to two bif series (linked by
blob_node) as control son. This reflects the control dependency in a
source program.

2. Each bif node can be linked up to three low_level nodes to
represents the expressions in this statement.

3. Each bif node can be linked only one symbol node.

4. Each low_level node can be linked to two low_level nodes, (this
allow the recursive definition of expressions) and one symbol node.

In the fourth step, all kinds of nodes will be stored on to disk. Same
nodes will be stored in the order of list links. A .dep file is
created. 

In fact, the more important things in the file are links, the
relations among all kinds of nodes. We call them dependence. (Control
dependence and data dependence in program analysis are only two
specific kind of dependence in our definition.)


Putting HPF frontend in a compiler testbed

The intermediate representation described previousely, in fact, can be
used to express most imperative style languages. 

In our case, the language in consideration is High Performance
Fortran. So a maping between the intermediate representation and each
HPF statements is defined. Deferent statements are labeled with
different tags. And a parser was implemented to convert HPF source
program to the intermediate representation, the so called frontend.

Besides this, two important tools are implemented as accessaries. One
is dumpdep and the other is unparse.

dumpdep will travel each kind of nodes along the list link in them,
and dump each of their feilds on to standout. When feature feild is
encontered, some integer value is converted to character strings, such
as tags. When link feild is encontered, address pointer is changed by
node id number.

unparse will travel along the tree link in root-first order, and print
out certain characters according to definitions of each node's tag. Be
more specific, each tag is assigned with a string coreponding to their
definition. During unparse, this string will be interpereted by
unparser to see what action should be taken. This is so called
table-driven method.

/**** 

Please note, so far, what we say is more general than a specific
HPF frontend.

In the following, we talk more about node definition of HPF, some
thing may be not quite suitable in PKU's version. Because I am not
sure how you change hpf2dep. 

****/

Node definition of HPF


Node definition of a language is a one to one mapping between language
construction and intermediate representation. Since our intermediate
representation allow mapping from different kind of languages. When we
deal with a specific language, certain ambiguity exists.

For example, a bif node can be linked to three low level node. When a
statement with only two expressions, like assign statement, is mapped
to the bif node, one of low level node must be empty. But it can be
any one of three of them. In another words, the intermediate
representation allow different candidates for a same language
structure.

When this happen, certain decission has to be made and specify out
which low level node is going to be null in this case.

Another ambiguity is that different kinds of language structures are
mapped to one kind of intermediate representations. As we have said,
tags are used to label different bif nodes and low level nodes, and
unparse will print out characters according to the tag of a node. So
we have to make sure, the same tag can be treated differently when
they are used in the above situations.

For example, DDOT is a low level tag, when it is used inside FORALL
statement, it will use ":" to seperate its sub item. Otherwise, it
will use "," to do so.

Bear the above two things in mind, one can give out the definition of
different languages. But even for the same language, there will be
different kind of definition. Our definition for HPF is only one of
them.


/**** 

Some definition is too difficult for the unparse program, for example,
the following is used to be the def for align statement.


------------------------------
  The keywords ALIGN,DISTRIBUTE,DYNAMIC,INHERIT play roles of attributes. So
we add four new variants which are ALIGN_OP,DISTRIBUTE_OP,DYNAMIC_OP, and
INHERIT_OP. The corresponging ll_nodes belong to CLASS SgAttributeExp.

     Although we think ALIGN_ATTR may be a better name than ALIGN_OP, we use
ALIGN_OP according to the style of Sage++.


	for ALIGN_OP :

		SgAttributeExp
		(variant=ALIGN_OP)
			|
			|
		+----------------+
		|		 |
		|		 |
	SgAlignAttrExp         (null)
	(variant=ALIGN_OP)
		|
		|
	+----------------+
	|		 |
	|		 |
   align_source_list	align_with_clause
   (EXPR_LIST) 		 (ALIGN_WITH_OP)
				|

---------------------------------

Look at the definition of ALIGN_OP, here it is ambiguous. One's sons
defined as a ALIGN_OP and a null. another ALIGN_OP sons defined as
align_source_list and align_with_clause. Suppose you are going to
unparse it, you are traveling a tree with "root first order", what do
you do when you meet a llnd taged as ALIGN_OP?

To accurately unparse the tree, you have to remember which ALIGN_OP
you are meeting with, say, a stack. That is why I called it too
difficult for unparse.

In fact, the only thing we need to do is just define another tag to
substitute the top ALIGN_OP, then the unparse work (Any future work
who will travel the tree using root first order) will become much
easier. 

Similiar kind of problem exists in 1.0. So I add four tags in tags.h:

#define ALIGN_ATT               835
#define DISTRIBUTE_ATT          836

#define CBREALIGN_STMT            845
#define CBREDISTRIBUTE_STMT       846

grep fD.y in 1.20, you will find how these four tag are used.

****/