Mike, 

Here are some DNA sequence data files for you to try out. The big file on the diskette (dataset1.exe) is a self-extracting zip file. Just run it off the diskette and tell it where there is an empty directory on your hard disk into which to unload the contents. This will give you the following files: 

T1i = concatenation of 26 introns (see below)
T1e = concatenation of 29 exons (see below)
T2b & T3b = two long sequence segments from the same general dataset 
            (215,422 and 232650 bases, respectively) 
readme.txt = this file 

The first two files (each about 2.7 MB when decompressed!!; see below for how I will reduce these file sizes in the future) were derived from a sequence containing 267,156 bases. Only those introns and exons were kept that a) had definite boundaries and b) were not referred to the complementary strand. For example, an exon line in the Genbank file would have had the form: 
     exon            91715..92006
with no "<" or ">" symbols or "complement(...)". 

The T1e file, for example, begins (ignoring the =====):

======
exon
 2             91715         92006 
g
c
t
c
.
.
.
======

and then has transitions like: 

======
.
.
.
a
g
a
exon
 4             93689         93986 
g
a
c
.
.
.
======

The same holds for the T1i file (but with "intron" separators).

The line after "exon" has 3 numbers: 

1) the serial number of the acceptable exon/intron as found in the GenBank FEATURES information; this serial number is created by my program (one of 3 programs I wrote to do this). You can ignore this number. 

2,3) the beginning and ending base numbers of the exon/intron. Thus to read in the next exon/intron, use a loop: 
FOR i=1 TO (last-first+1) : ... : NEXT. 

The sequence data in the remaining two files is in the following comma-delimited 2-column format: 

======
 1 , g
 2 , a
 3 , t
 4 , c
.
.
.
 267153 , g
 267154 , a
 267155 , t
 267156 , c
======

I intend to eliminate the first column (and associated baggage) in order to greatly reduce the file size. I have kept it so far as a debugging aid.  

Obviously T1i and T1e are to be used for training, and T2b and T3b are for independent testing. 

I hope this works out very well with parallel cascade. That could constitute a breakthrough. 

I can provide you with the source and executables of the 3 programs later if you wish. I need to clean them up a bit first (eliminating debugging statements that are commented out; adding some documentation; adding some more error traps and messags; etc.). 

Yours, 
Ed