This web site is designed to obtain mouse DNA sequences expressed in specific tissues and order them informatively for use in transcript microarray design. The mouse sequences, submitted either as cDNA or expressed sequence tags (EST's) , are obtained from the Mouse Genome Database (MGD) which is curated by the Jackson Laboratories. There are a total of 388 815 cDNA or EST sequences in the database as of April 16, 1999. Mouse gene assignment and human homology relationships are curated by the MGD (see below). The human homologue is compared against the Staudt Laboratory Lymphochip Microarray Human OncoGene set. The user may choose to have the sequences printed out in order of 1) those entries which have a human homologue which corresponds to a gene in the OncoGene set, followed by 2) entries with an assigned mouse gene and a human homologue which does not necessarily correspond to a gene inthe human oncogene set (see section on Human Oncogene Set below), then 3) sequences which have been assigned to a mouse gene, and finally 4) seqeunces which do not have a mouse gene assignment and which are ranked according to whether they are 1) a 3' sequence or 2) a 5' sequence. Informative links from the sequence data are available as follows:
1) The DNA type will link to source information about this entry in
MGD at the
Jackson Labs.
2) The Unigene Cluster Link will link to the
UniGene page at NCBI.
3) The Mouse AccID will link to the GenBank sequence database entry file through
Entrez at
NCBI.
4) The Mouse Gene Symbol identified with this cDNA or EST will link to gene information and
literature references in
MGD at the
Jackson Labs.
5) The Homologous
Human Gene
will link to the NIH/CIT mirror of the
Weizmann Institute's
GeneCards entry for this gene.
6) The Human Genome Database ID will link to Genome Data Base
information about the human gene.
7) Entries with a human homologue which is found in a defined subset of the Staudt Laboratory Lymphochip Microarray Human
OncoGene
set will link to Med Miner, a text mining tool developed by the Biophysical Pharmacology Group, Division of Basic Sciences, National Cancer
Institute (BPG/DBS/NCI), which will construct a Boolean query based upon the gene name and synonyms and disorders with which they are associated, including user input terms to expand or restrict the query, for submission to PubMed at the National Library of Medicine.
Depending upon the size of the libraries you are screening and the number of sequences which you want returned in your output, you may need to adjust you browser settings to increase the amount of disk space available for cache. Insufficient cache may generate an error message which suggests that the server is not responding. In Netscape, you may adjust your disk cache space by first clicking on your browser to activate it, then going to Edit-Preferences and clicking on the Advanced token to show the options under Advanced. When you see these options, choose Cache, where you will find a box where you can adjust your disk cache space. A value greater than 7000 KBytes is recommended. (You may want to remember to empty your cache file on a regular basis.) In Internet Explorer, go to View, then Options, and choose Advanced. Click on the Setting button and adjust your amount of disk space to use accordingly. You may empty your folder here also.
Gene Marker and Homology AssignmentsThe MGD maintains mouse gene marker and mammalian homology data for the sequences. More detailed information about the criteria used by MGD to curate assertions of genetic markers and mammalian homologies can be found through the links provided here. Briefly, genetics markers for the sequences can be based upon the following relationship between the gene or its product and the DNA sequence assigned to the gene: amplifies, encodes, hybridizes, or as determined by the MIT Whitehead Institute. A putative assignment for EST's from the Washington University HHMI Mouse EST Project is made based upon a BLAST search against the gene database. The relationship of the sequence to its marker is given in the results list.
Description of InputSelect to 1) Rank by inclusion in the human oncogene set or 2) Rank alphabetically and note inclusion in the Staudt Laboratory Lymphochip Microarray Human OncoGene set. Selection is required, and ranking by inclusion in the is selected as a default. This choice will force the sequences to be ordered so that those assigned to a mouse gene with a human homologue in the Staudt Lab lymphochip microarray Human OncoGene set will be returned first, in alphabetical order. This will be followed by sequences with an assigned mouse gene which have a human homologue, then sequences with an assigned mouse gene, and finally the unassigned seqeunces. The second choice will order the sequences alphabetially by those with an assigned mouse gene which has a human homologue and note whether the gene is in the Staudt Lab Lymphochip Microarray Human OncoGene set, followed by sequences with an assigned mouse gene and finally the unassigned sequences. If you are looking for sequences assigned to a specific mouse gene, use the second option, to rank alphabetically and note inclusion in the human oncogene set. Also, consider using the cDNA and EST Expression Query Form at the Jackson Laboratories web site.
Input the maximum number of sequences to review (< 500 is HIGHLY recommended!). Warning! Retrieval of information may take a long time, depending on the number of tissue types chosen and the size of the tissue libraries. A quantity between 50 and 300 is recommended. Larger numbers will generate extensive output which may overcome a browser. Also, long lists may be difficult to review at one time. If you are looking for seqeunces associated with one particular gene, use a relatively small number (10-50, depending on the number of tissue sources you choose). For a large number of tissue sources, this web site will be very slow since it extracts the complete list of deposited cDNA and EST sequences for all of the tissue types chosen. You may want to consider using the cDNA and EST Expression Query Form at the Jackson Laboratories web site.
To start the list at a certain sequence, identified either by its sequence accession identifier (Mouse AccID) or its mouse gene symbol, then answer yes to whether you want to start the list at a certain mouse gene. Either choose the Mouse Gene Symbol and type it in; to continue from a certain accession id,or choose the Mouse AccID and type in the accession id. The gene symbol or accession id input is case insensitive.
Select any mouse tissue (one or more). You can choose any (one or more) of the 304 tissue types which are listed. In theory you can choose all of them; however, this comprises a list over a quarter of a million sequences and is not particularly efficient. If you ae searching all tissue types for expression of a certain gene, try the cDNA and EST Expression Query Form at the Jackson Laboratories web site.
Now hit SUBMIT. The information may take some time to return if the tissue choice corresponds to large library sets.
Description of OutputThe list can be ordered by 1) sequences which have been assigned to a gene which has a human homologue included in theStaudt Lab Lymphochip Microarray Human OncoGene set, 2) sequences which have been assigned to a gene which has a human homologue, 3) sequences which have been assigned to a mouse gene, and finally 4) sequences which do not have a gene assignment, or it can be ordered by 1) sequences which have been assigned to a gene which has a human homologue with notation as to whether this entry is in the Staudt Lab Lymphochip Microarray Human OncoGene set, 2) sequences which have been assigned to a mouse gene, and finally 3) sequences which do not have a gene assignment. Within these three categories, the list is ordered alphabetically by mouse gene name, by the relationship to the gene (with PUTATIVE assignments ordered last within a gene set), by sequence direction where available (3' before 5'), and then by tissue in which the particular sequence entry is expressed.
The output contains:Column 1: The name of the tissue type for the clone from which this sequence has been obtained. If several tissue types have been selected, the presence of sequences corresponding to the same gene and expressed from the tissue types will be grouped together in the output so that the user may manipulate the list based on the expression pattern as far as it can be discerned from the sequence libraries.
Column 2: The designation given to the type of sequence, either cDNA or EST, and, if the information is available, whether the sequence is a 3' or 5' read. This entry links to the source summary provided by the MGD at the Jackson Labs which typically includes the strain of mouse, tissue of origin, cell line of origin, contact person for information about the clone. Further links to published references are available.
Column 3: A "U" is present in this column if the gene bank accession identifier is known for the sequence (i.e. column 3). If the "U" is present, this provides a link to the UniGene cluster identifier at the NCBI for this sequence. Additional information pertaining to the cluster can be found there.
Column 4: The GenBank accession id for the mouse sequence is entered in column 4. This will link to the GenBank sequence database entry file through Entrez at NCBI.
Column 5: This contains the gene symbol for the mouse gene to which the sequence has been assigned. The Mouse Gene Symbol identified with this cDNA or EST will link to gene information and literature references in MGD at the Jackson Labs.
Column 6: This contains the relationship between the sequence and its gene or protein product. The relationship provided by the MGD can be 1) amplifies, 2) encodes, 3) hybridizes, 4) MIT Whitehead Institute marker, or "-" for unknown relationship. EST sequences may have a putative gene assignment based on BLAST searching of gene bank.
Column 7: This contains the symbol for the human gene if there is a human homologue associated with a mouse gene assigned to htis sequence. The Homologous Human Gene
will link to the NIH/CIT mirror of the
Weizmann Institute's
GeneCards entry for this gene.
Column 8: This is the Human Genome Data Dase identifier for the human gene which will link to Genome Data Base information about the human gene.
Column 9: This is entered as oncogene if the human symbol stored in the mouse genome database corresponds to a human gene symbol in the Staudt Lab Lymphochip Microarray Human OncoGene set. Because the human symbol may have several synonyms, there is no guarantee that the symbol stored in the mouse genome database will exactly correpsond to its counterpart in the Human OncoGene set even if they they are equivalent. In addition, human EST's in the Human OncoGene Set are not included inthe search list because there is no known gene symbol or identifier to compare against the mouse genome database human homologue identifier. A link to Med Miner, a text mining tool developed by the Biophysical Pharmacology Group, Division of Basic Sciences, National Cancer Institute (BPG/DBS/NCI), for access to PubMed at the National Library of Medicine is provided through this entry.
Comments about Comparison with the Staudt Lab Human OncoGene Set
The human homologue information which is extracted from the Mouse Genome Database (MGD) at the Jackson Laboratories is references by the human gene symbol, which may have several synonyms. This is compared to the gene symbols where available in the Staudt Lab Lymphochip Microarray Human OncoGene set. If a gene symbol is not available in the human (e.g. a human EST which does not have an official gene name assignment) or if the synonym used by either the Mouse Genome Database or the Human OncoGene set is not equivalent, then a relationship between the oncogene set and the sequence being listed will not be noted.
Authors and Contributors
Authors and contributors are listed in Credits.