SAS DATA INFILE para importar archivos txt a librerías.

Programacion SAS 2

Una forma rápida de importar archivos secuenciales (planos txt) que contienen información estructurada separada por algún carácter especial a librerías funcionales SAS es utilizar un PASO DATA con INFILE, SAS leerá el archivo y pro cada separador almacenara la información en la posición (campo) que corresponda;

Sintaxis:

DATA NombreNuevoData ;  
  INFILE "RutaCompleta\NombreArchivo" DELIMITER=<carácter separador> <opciones de importación> ;
  < Declaración de Variables >  
  < Lectura de Variables > 
RUN ;

 

Ejercicio, Importar archivo txt separado por el carácter |

DATA InfoMayo ; 
  INFILE 'C:\DatosMayo.txt' DELIMITER='|' MISSOVER DSD ; 
  LENGTH A 8 B $ 20 C $ 20 D $ 20 T $ 20 E $ 20 F $ 20 G $ 20 H $ 20 I $ 20 J $ 20 K $ 20 L $ 20 M $ 20 ; 
  INPUT A B C D T E F G H I J K L M ; 
RUN ; 

 

Opciones de Importación:

  • DELIMITER= The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file). Establece el carácter que funciona como separador.
  • DSD The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. Second, it allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable. Si faltan datos, evita que el resto cambie de posición.
  • FIRSTOBS= This option tells SAS what on what line you want it to start reading your raw data file. If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. For example, if you are reading a comma separated file or a tab separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will ignore the first line with the names of the variables). Indica la fila en la que se comienza leer el archivo.
  • MISSOVER This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file. Establece que cada linea contiene información de registros independientes.
  • OBS= Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. When you want to read the entire file, you can remove the obs= option entirely. Indica el numero de linea limite para leer el archivo.

 

Bibliografía:

  • http://www.ats.ucla.edu/stat/sas/faq/InfileOptions_ut.htm
  • http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm

 

¿Sabes de otro método para importar archivos?

¿Que te parecio el contenido?