Data Import Wizard

ArrayStar provides a dialog for importing delimited text files containing microarray data, sequence data, or annotation information. The name of the dialog changes depending on your point of entry.

 

Point of Entry

Dialog Name

In the Add Experiments to Import screen of the Project Setup Wizard, check "Force Custom Importing" and click the Import from ArrayStar Project button.

Custom Experiment Import Wizard

In the Set Up Attributes dialog of the Project Setup wizard, check "Force Custom Importing" and click the Import Experiment Attributes button.

Custom Attribute Import Wizard

Use the File > Import Annotations command.

Custom Annotation Import Wizard

 

The following window appears:

 

 

The top half of the window allows you to specify how your file is delimited. The bottom half displays the contents of your text file as it would be used with the currently selected settings in the top portion of the window. As you change the options on top, the preview updates on the bottom.

 

      To get started, if you have previously created a File Template that you wish to use, select it from the dropdown list at the top of the window. Otherwise, use the default value in this field.

 

      Specify the Delimiter Type used in your file. If a delimiter other than a tab, space, comma, or semicolon is used, select Other and then enter the delimiter in the Other field.

 

      Enter a string for lines that should be ignored in the Comment line indicator field. Lines that begin with the values you entered will be ignored as comment lines during the import. If this field is left empty, no lines will be ignored as comments.

 

      To the right of Skip first, enter the number of lines you would like the wizard to skip at the beginning of your file, such as comment lines that appear at the top on your file. If there is no header in the file, check the box next to No header line.

 

Once you have entered the delimiting information for your text file, click Next. The following window appears:

 

 

Specify the type of data in each column by selecting one of the following options from each header’s dropdown list:

 

      Ignore - The column will not be used.

 

      Gene ID - This option appears only when importing microarray expression files. The column contains an identifying key for each gene. All columns designated as “Gene ID” will be available for use in the Gene Table view via the Manage Columns dialog.

 

Note: When importing microarray expression data, at least one Gene ID column must match with the Gene ID chosen for the existing datasets in your project. Otherwise, an error message will appear, and you will be asked to re-assign an appropriate column for import to proceed.

 

      Probe ID – The column contains probe identifiers. This option appears only when importing microarray expression files.

 

      Signal - The column contains the signal for one experiment for each gene. This option appears only when importing microarray expression files.

 

      Sequence Name – The column contains the names of your sequences. This option appears only when importing sequence data.

 

      Sequence – The column contains the actual nucleotide sequence. This option appears only when importing sequence data.

 

      Description - The column contains descriptive information or gene annotations for each gene. All columns designated as “Description” will be available for use in the Gene Table view via the Manage Columns dialog. Description columns are always optional.

 

      Aligned Template – The column specifies what template sequence the reads align to. This option only applies to alignment data.

 

      Start Position – The column specifies the start position of the read along the template. This option only applies to alignment data.

 

      End Position – The column specifies the end position of the read along the template. This option only applies to alignment data.

 

      Aligned Strand - The column specifies the orientation of the read relative to the template. This option only applies to alignment data.

 

      SNP Reference ID – The name of the contig or chromosome on which the SNP occurs. This maps to the Contig ID column of the SNP Table.

 

      SNP Position – The coordinate of the SNP on the contig or chromosome. This maps to the Ref Pos column of the SNP Table.

 

      SNP Reference Sequence – The reference base at the site of the SNP. This maps to the Ref Base column of the SNP Table.

 

      SNP Sequence –The column must contain a single base, which may be a degenerate letter code indicating multiple alleles by the content of its ambiguity. This maps to the Called Seq column of the SNP Table.

 

      SNP Sequence Alleles –The column contains a separate base letter code for each allele. Only unambiguous base letter codes and 'N' are used.

 

ArrayStar offers the option of applying a certain data type to multiple columns. This may be desirable if your file contains a large number of columns with the same data type. Select a type from the Column Type dropdown menu, then use Apply to all to apply the selected type to all columns or Apply to remaining to apply the type only to columns that are currently set to "Ignore." Click Clear all if you wish to reset all columns back to "Ignore."

 

Notes pertaining to certain situations:

 

      When importing microarray expression data files, at least one Gene ID column and one Signal column are required. If raw microarray data are being imported, at least one Probe ID column is required. If multiple Signal columns are provided, each column is treated as a separate experiment.

 

      When importing sequence data files, at least one Sequence Name and one Sequence column are required.

 

      When importing a custom text file, note that each row must have a unique combination of template ID (usually the contig or chromosome name) and template position. Before importing a text file, you may need to edit it in spreadsheet format to manually remove any duplicates. The "uniqueness criterion" usually necessitates that SeqMan files be imported via the default method, rather than with the Data Import Wizard.

 

If you made any changes to the selected File Template, a Save button will appear in the top right of this window. To store the current settings as new template, click the Save button save, enter a new name for your template, then click Save. Your new template will now appear in the File Template dropdown list the next time you import a file.

 

If you are importing multiple files and would like to apply the current settings to all of them, click the check box next to Apply settings to all files in this batch. The settings will be retained until the Project Setup Wizard is closed.

 

Finally, click Finish to begin importing your data.