Cinteny primarily requires two kinds of data, the information about markers and the homologous groups. The common names of the genomes are optional. If it is not uploaded then the taxId is used as the name of the genome.
All the files are tab-delimited text files. The columns of each file are described below. If data is not available for an attribute, a '-' should be put instead. The geneId in the maker information file and the homologs file should be the same for a given gene.
Marker | Homologs | Common namesMarker
The following information is required for each marker- taxId
- to indicate the genome it belongs two. should be numeric
- chromosome
- to find the locality
- marker start position
- with respect to the chromosome
- marker end position
- again, with respect to the chromosome
- orientation
- orientation of the marker on the chromosome
- symbol
- unique id
- for genes, it is the geneId
Sample Data:
9606 14 60858268 61087451 + PRKCH 5583 9606 14 61107009 61191177 + LOC400221 400221 9606 14 61231992 61284729 + HIF1A 3091 9606 14 61298920 61332899 + SNAPC1 6617 9606 14 61368391 61368840 - LOC122867 122867 9606 14 61400683 61402320 + MOCS3P 326305Sample Files: human.dat, mouse.dat
Homologous Groups
The following information is required for each homologous marker. All the columns are required.- HID
- ID of the homolgous group it belongs to. should be numeric
- taxId
- genome it belongs to. should be numeric
- unique id
- for genes, it is the geneId. must be same as the marker file
- symbol
- same as the marker file
- protein GI
- ignored (not used currently)
- protein accession
- ignored (not used currently)
Sample Data:
74567 9606 4917 NTN2L 5453810 NP_006172.1 74567 10090 18209 Ntn3 6754904 NP_035077.1 74567 10116 114524 Ntn3 34871100 XP_343868.1 74567 7227 32400 NetB 17530937 NP_511155.1 74569 9606 5583 PRKCH 28557781 NP_006246.2 74569 10090 18755 Prkch 31543511 NP_032882.2 74569 10116 81749 Prkch 13592027 NP_112347.1Sample File: homologshm.dat
Common Names
This is an optional file. The output is easy to interpret when common names are provided. The following information is required for each genome- taxId
- taxid of a genome
- common name
- could be the common name or the scientific name
Sample Data:
9606 Human 9598 Chimp 9615 Dog 10090 Mouse 10116 Rat 9031 Chicken 7227 DrosoSample File: names.dat