Read 2D waves using file reference number (without line number)

When parsing text data files, it's common to encounter the need to read a data block, such as a 2D wave in Igor. In this context, I'd like to outline two approaches I've employed.

The first method is encapsulated within the function "ReadMatrixPlain()".

The second method involves two functions: "ReadMatrix()" and "ReadMatrixMain()". This approach draws inspiration from the usage of sscanf and fscanf functions in MATLAB. In MATLAB, these functions can be invoked with the file identifier (fileID, akin to file reference number in Igor) and data block size (sizeA) using the following syntax:

A = fscanf(fileID,formatSpec,sizeA)
A = sscanf(str,formatSpec,sizeA)

Below is the corresponding Igor code, followed by the data file listing. Any advice is welcomed.

 

#pragma TextEncoding = "UTF-8"
#pragma rtGlobals=3     // Use modern global access method and strict wave access.
#pragma ModuleName=Demo

static function ReadMatrixPlain()
    Variable number_of_rows, number_of_columns
    String format_str, buffer
    Variable ref_num, i, row

    Open/R ref_num as "D:matrix.txt"
    number_of_rows = 7
    number_of_columns = 11
    Variable v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11

    // construct format string
    format_str = "%d"
    for (i = 0; i < number_of_columns-1; i++)
        format_str += " %f";
    endfor

    for (row = 0; row < number_of_rows; row++)
        FReadLine ref_num, buffer
        buffer = TrimString(buffer)
        // In this case, one has to write all var: v1, v2, ..., v11
        // I think this is not "elegant"
        sscanf buffer, format_str, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11
        Printf format_str + "\n", v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11
    endfor
end

static function/WAVE ReadMatrix(Variable ref_num, Variable number_of_rows, Variable number_of_columns)
    String buffer
    Variable row

    Make/O/FREE/N=(number_of_rows, number_of_columns) wave_numerical_2d
    Make/T/O/FREE/N=(number_of_columns, number_of_rows) wave_text_2d_transposed

    for (row = 0; row < number_of_rows; row++)
        FReadLine ref_num, buffer
        buffer = TrimString(buffer)
        buffer = ReplaceString(" ", buffer, ";")
        buffer = RemoveFromList("", buffer)
        Wave/T wave_text_1d = ListToTextWave(buffer, ";")  // free text wave
        wave_text_2d_transposed[][row] = wave_text_1d[p]
    endfor
    wave_numerical_2d[][] = str2num(wave_text_2d_transposed[q][p])  // transpose
    //Print wave_numerical_2d
    return wave_numerical_2d
end

static function ReadMatrixMain()
    Variable number_of_rows = 7
    Variable number_of_columns = 11

    Variable ref_num
    Open/R ref_num as "D:matrix.txt"

    Wave wave_2d = ReadMatrix(ref_num, number_of_rows, number_of_columns)
    Duplicate/O wave_2d, wout

    close ref_num
end

matrix.txt 

    1  0.000  0.010  0.000  0.010  0.000  0.032  0.000  0.033  0.000  0.086
    2  0.000  0.001  0.000  0.001  0.006  0.002  0.000  0.002  0.006  0.017
    3  0.000  0.005  0.000  0.005  0.001  0.005  0.000  0.005  0.001  0.021
    4  0.005  0.000  0.041  0.000  0.000  0.000  0.000  0.000  0.000  0.046
    5  0.000  0.002  0.000  0.002  0.033  0.072  0.000  0.072  0.033  0.216
    6  0.003  0.000  0.016  0.000  0.000  0.000  0.002  0.000  0.000  0.021
    7  0.000  0.006  0.000  0.006  0.000  0.002  0.000  0.002  0.000  0.015

 

Welcome to the forum!

Maybe I don't get the issue here, but does something not work? I am not sure why you use a manual reading approach using FReadLine. Why not use LoadWave, Concatenate and then trim / covert as needed? This could be done in fewer lines and may be faster. Or is there some complication we do not (yet) know about such as text in-between the numbers?

Have a look at the (rather extensive) options for LoadWave, in particular the /M flag:

DisplayHelpTopic "LoadWave"

 

In reply to by chozo

Thank you for your comments and suggestions. I apologize for any confusion. Let me provide some background context. Currently, I am parsing the PROCAR file from VASP, which is written in Fortran (for more details, please refer to https://www.vasp.at/wiki/index.php/PROCAR). I have developed a subroutine for this task using MATLAB. However, it's not straightforward to "translate" MATLAB code into Igor language.

The first 40 lines of a PROCAR file is shown below.

PROCAR lm decomposed                                                                                                                                                                         # of k-points:  150         # of bands:   96         # of ions:    7

 k-point     1 :    0.33333333 0.33333333 0.00000000     weight = 0.00666667

band     1 # energy  -51.70389416 # occ.  1.00000000

ion      s     py     pz     px    dxy    dyz    dz2    dxz  x2-y2    tot
    1  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    2  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    3  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    4  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    5  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    6  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    7  0.000  0.249  0.000  0.746  0.000  0.000  0.000  0.000  0.000  0.995
tot    0.000  0.249  0.000  0.746  0.000  0.000  0.000  0.000  0.000  0.995

band     2 # energy  -51.70389123 # occ.  1.00000000

ion      s     py     pz     px    dxy    dyz    dz2    dxz  x2-y2    tot
    1  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    2  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    3  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    4  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    5  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    6  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    7  0.000  0.746  0.000  0.249  0.000  0.000  0.000  0.000  0.000  0.995
tot    0.000  0.746  0.000  0.249  0.000  0.000  0.000  0.000  0.000  0.995

band     3 # energy  -51.69782047 # occ.  1.00000000

ion      s     py     pz     px    dxy    dyz    dz2    dxz  x2-y2    tot
    1  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    2  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    3  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    4  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    5  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    6  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
    7  0.000  0.000  0.995  0.000  0.000  0.000  0.000  0.000  0.000  0.995
tot    0.000  0.000  0.995  0.000  0.000  0.000  0.000  0.000  0.000  0.995

[Parsing this file could give a high dimensional quantity, w_{spin, ikpt, iband, iorbital, ion}. In the case above, ikpt ranges from 0 to 150, iband from 1 to 96, ion from 1 to 7. spin index could be either 1 or 2.]

 

The PROCAR file consists of text, empty lines, and matrices interspersed throughout its contents. To extract data from the text sections, I employ the sscanf function. However, I've observed that certain versions of VASP may omit empty lines. To streamline the reading process, I opt to omit all empty lines using a custom user-defined function called "ReadLine".

static function /S ReadLine(ref_num)
    // This function cannot be used at the end of file as it will lead to an endless loop!!!
    Variable ref_num
    String buffer
    do
        FReadLine ref_num, buffer
        buffer = TrimString(buffer)
    while (cmpstr(buffer, "", 1) == 0)
    return buffer
end

 

In my case, I find it necessary to count the number of lines to properly utilize the LoadWave function for loading the matrices between the text sections. However, this step can sometimes be overlooked, leading to errors. So I gave up using line numbers and sought a solution that doesn't rely on them, aiming for maximum efficiency. That's the background behind writing the above code.

In reply to by chozo

chozo wrote:

Welcome to the forum!

Maybe I don't get the issue here, but does something not work? I am not sure why you use a manual reading approach using FReadLine. Why not use LoadWave, Concatenate and then trim / covert as needed? This could be done in fewer lines and may be faster. Or is there some complication we do not (yet) know about such as text in-between the numbers?

Thank you for your comments and suggestions. It seem that I have to counter the number of line in oder to use LoadWave in my case. This step is sometimes overlooked, which will lead to errors.

In reply to by tony

tony wrote:

Have a look at the (rather extensive) options for LoadWave, in particular the /M flag:

DisplayHelpTopic "LoadWave"

 

 

Thank you for your comments and suggestions. Please allow me to provide a slightly detailed explanation of the background of the issue. The explanation is a bit lengthy. I apologize for mistakenly creating a new Reply instead of a Quote for it.

In reply to by guoqilin

guoqilin wrote:

In my case, I find it necessary to count the number of lines to properly utilize the LoadWave function for loading the matrices between the text sections. 

You can load the file as general text to extract the matrices between the text sections, like this:

LoadWave/M/G/O/N=matrix path_to_PROCAR_file

When I use your example PROCAR file, this code produces three matrix waves called matrix0, matrix1, and matrix2, which contain the numeric data for each of the three data blocks.

In reply to by Ben Murphy-Baum

Ben Murphy-Baum wrote:

 

guoqilin wrote:

 

In my case, I find it necessary to count the number of lines to properly utilize the LoadWave function for loading the matrices between the text sections. 

 

 

You can load the file as general text to extract the matrices between the text sections, like this:

LoadWave/M/G/O/N=matrix path_to_PROCAR_file

When I use your example PROCAR file, this code produces three matrix waves called matrix0, matrix1, and matrix2, which contain the numeric data for each of the three data blocks.

Thank you for your insightful example. I will now try to improve my code.