Data files


Cinteny primarily requires two kinds of data, the information about markers and the homologous groups. The common names of the genomes are optional. If it is not uploaded then the taxId is used as the name of the genome.

All the files are tab-delimited text files. The columns of each file are described below. If data is not available for an attribute, a '-' should be put instead. The geneId in the maker information file and the homologs file should be the same for a given gene.

Marker | Homologs | Common names

Marker

The following information is required for each marker
  • taxId
    to indicate the genome it belongs two. should be numeric
  • chromosome
    to find the locality
  • marker start position
    with respect to the chromosome
  • marker end position
    again, with respect to the chromosome
  • orientation
    orientation of the marker on the chromosome
  • symbol
  • unique id
    for genes, it is the geneId

Sample Data:
9606	14	60858268	61087451	+	PRKCH		5583
9606	14	61107009	61191177	+	LOC400221	400221
9606	14	61231992	61284729	+	HIF1A		3091
9606	14	61298920	61332899	+	SNAPC1		6617
9606	14	61368391	61368840	-	LOC122867	122867
9606	14	61400683	61402320	+	MOCS3P		326305
Sample Files: human.dat, mouse.dat

Homologous Groups

The following information is required for each homologous marker. All the columns are required.
  • HID
    ID of the homolgous group it belongs to. should be numeric
  • taxId
    genome it belongs to. should be numeric
  • unique id
    for genes, it is the geneId. must be same as the marker file
  • symbol
    same as the marker file
  • protein GI
    ignored (not used currently)
  • protein accession
    ignored (not used currently)

Sample Data:
74567   9606    4917    NTN2L   5453810     NP_006172.1
74567   10090   18209   Ntn3    6754904     NP_035077.1
74567   10116   114524  Ntn3    34871100    XP_343868.1
74567   7227    32400   NetB    17530937    NP_511155.1
74569   9606    5583    PRKCH   28557781    NP_006246.2
74569   10090   18755   Prkch   31543511    NP_032882.2
74569   10116   81749   Prkch   13592027    NP_112347.1
Sample File: homologshm.dat

Common Names

This is an optional file. The output is easy to interpret when common names are provided. The following information is required for each genome
  • taxId
    taxid of a genome
  • common name
    could be the common name or the scientific name

Sample Data:
9606    Human
9598    Chimp
9615    Dog
10090   Mouse
10116   Rat
9031    Chicken
7227    Droso
Sample File: names.dat