Index: An Incremental Blog
Newer: Thesis Structure (report or research paper in general)
Older: No Blog is too Short
[ Vlado Keselj | 2020-06-02..2020-06-13 ]

Starfish: The First Example

The Starfish project is a system for Perl-based text-embedded programming and preprocessing. It is still only author-friendly, in the sense that I use it frequently for many projects, but its purpose and use may not be so obvious for an interested user seeing it for the first time. Several basic examples are included in the project web page, but they are quite diverse and I am not sure which one is the best one to be presented to an interested reader who does not know anything about the project. One promising approach is to present a simple Java example, which was chronologically the first motivating example to start the project in the first place. This blog describes the example.

Java Preprocessor

The C Preprocessor is a useful and unique feature of the C programming language. It is a part of the C compiler, but it is a simple language in its own, which does simple text manipulation before feeding it to the proper C compiler. One use of the preprocessor is inclusion or exclusion of parts of code depending on values of some configuration variables. It preprocesses C source code as a general text, and this is why it is sometimes criticized for not using deeper semantics of the language, and it is also praised for the same reason because it is very clear what it does and can be used on text other than C programs. For example, it was used in the Imake system for preprocessing makefiles. Java does not have a preprocessor and it would be useful in some situations.

Sometime around 2001 I was working on a Java software system where I needed two versions of the source code: a test version to be used for testing and development, and a release version to be a production release. The test version would carry around a lot of meta information on data structures, be able to produce verbose debug code, and the release version would be efficient and slim in code size and running time. This means that at various places in code, I needed to write two versions of code snippets: a test version and a release version, and the appropriate version would be included everywhere based on the value of some global variable. This could be simulated using Java constructs, but release code would be bloated and running-time efficiency of the release code is not always easy to achieve.

As an example, we will consider the following simple Java code:

/**
   A simple Java file.
*/

public class simple {

  public static int main(String[] args) {

    System.out.println("Test version");
    System.out.println("Release version");

    return 0;
  }
}
where the red line would be included in the test version of the code, and the blue line would be included in the release version of the code.

One solution would be to use the C preprocessor. However, the C preprocessor is a part of the C compiler and it is not meant and not convenient to use independently. Its functionality is tailored to the C language, and it is not as easy to use for general and more flexible text processing that we may want to have. It is more convenient to write a text processor in Perl from scratch than to rely on the C preprocessor. That leads to the second solution: write an independent preprocessor from scratch in text-friendly high-level language like Perl. There is a general need for such language, so we get the idea of a general-purpose preprocessing system. The system m4 is one such system, but it has its own, specific syntax. So, here we are, looking for a general preprocessor.

Fully-Embedded Preprocessor

For our preprocessing task, there are many ways to write a preprocessor that would include or exclude annotated parts of code. Similarly to C preprocessor, it would read our source Java file and produce another Java file prepared for compilation. To distinguish two files, we could come up with a different name extension for the first Java file, which we could call a meta-source code file. One issue with this approach is that we now must manage two files for the same Java source file, and the second issue is solving the question of how exactly our preprocessor should look like. We could emulate the functionality of the C preprocessor, but designing a new universal preprocessor allows us to think bigger and aim at a more open-ended general functionality. Both of these issues are addressed with a fully-embedded preprocessor, which combines preprocessing instructions and preprocessing result in the same file, and allows for a quite general Perl preprocessing code. Starfish gives this functionality.

Our example Java file could be written in the following way using the Starfish convention:

/**
   A simple Java file.
*/
// Uncomment version:
//<?   $Version = 'Test';    !>
//<? # $Version = 'Release'; !>

public class simple {

  public static int main(String[] args) {

    //<? $O = "    ".($Version eq 'Test' ?
    // 'System.out.println("Test version");' :
    // 'System.out.println("Release version");' );
    //!>

    return 0;
  }
}
Starfish code is actually embedded Perl code found between delimiters <? and !> and it is commented out using the Java line comment notation //. The blue and red lines are used to choose version of the software that we want to produce. The red line contains code commented out in Perl, so the chosen version is the "Test" version. The green snippet code shows how we can select the appropriate line of Java and produce it. The Perl variable $O is used as a special variable to specify the generated code. Starfish has also a command echo that effectively appends to this variable.

The result of preprocessing is not a new file, but the source file is actually updated. This is called the update mode of Starfish and it is the default mode. This is why we call Starfish a fully-embedded preprocessor. If the name of the Java file is simple.java then we would run the following command:

 $ starfish simple.java
and the contents of the file simple.java is now:
/**
   A simple Java file.
*/
// Uncomment version:
//<?   $Version = 'Test';    !>
//<? # $Version = 'Release'; !>

public class simple {

  public static int main(String[] args) {

    //<? $O = "    ".($Version eq 'Test' ?
    // 'System.out.println("Test version");' :
    // 'System.out.println("Release version");' );
    //!>//+
    System.out.println("Test version");//-

    return 0;
  }
}
We can see that the desired line of code has been generated and inserted in the file (magenta colored part). The generated part is delimited with strings //+ and //-, so if we run the starfish again on the file, the file will not be changed because the generated part would be replaced with the same generated string. We also must avoid using strings //+ and //- for other purposes.

If we comment out the 'Test' line and uncomment the 'Release' line in the new simple.java file as follows:

/**
   A simple Java file.
*/
// Uncomment version:
//<? # $Version = 'Test';    !>
//<?   $Version = 'Release'; !>

public class simple {

  public static int main(String[] args) {

    //<? $O = "    ".($Version eq 'Test' ?
    // 'System.out.println("Test version");' :
    // 'System.out.println("Release version");' );
    //!>//+
    System.out.println("Test version");//-

    return 0;
  }
}
and run:
 $ starfish simple.java
again, the file simple.java file will look as follows:
/**
   A simple Java file.
*/
// Uncomment version:
//<? # $Version = 'Test';    !>
//<?   $Version = 'Release'; !>

public class simple {

  public static int main(String[] args) {

    //<? $O = "    ".($Version eq 'Test' ?
    // 'System.out.println("Test version");' :
    // 'System.out.println("Release version");' );
    //!>//+
    System.out.println("Release version");//-

    return 0;
  }
}

Preprocessing Multiple Files

If we want to preprocess a number of Java files in a project, it would be tedious and error-prone to modify each of them to set them to the appropriate Test or Release version. There are several ways how this problem could be solved and we will describe three of them: (1) using Perl require command, (2) using Make and Starfish -e option, and (3) using the Starfish starfish.conf configuration file.

(1) Using Perl require command: To have one $Version parameter controlling many files, we could simply have a Perl file called configuration.pl with the following content:

#!/usr/bin/perl
$Version = 'Test'; # Test or Release
1;
and one of the first lines in each Java source file would be:
//<? require 'configuration.pl' !>
In this way, we would have one point of control for the Test or Release version of all files.

(2) Using Make and the Starfish -e option: Starfish has an option -e for initial Perl code execution, somewhat similar to Perl, and we can use it to set the Version variable. For example, if we use a Makefile to compile all Java files in a project, we could add a preprocessing command for each of them in the following way in the Makefile:

VERSION=Test

simple.class: simple.java
	starfish -e='$$Version="$VERSION"' $<
	javac $<
We would again have one point of version control, this time in the Makefile.

(3) Using the Starfish starfish.conf configuration file: The idea of using a Perl configuration file, as shown in (1), is so common in many situations that we use a standard name for the configuration file called starfish.conf and the Starfish command read_starfish_conf to include this information. Similarly to (1), the contents of the file starfish.conf would be:

$Version = 'Test'; # Test or Release
1;
and one of the first lines in each Java source file would be:
//<? read_starfish_conf !>
This is the common way to represent per-directory configuration in Starfish. One important difference between this approach and the earlier approach with the standard Perl configuration file (1) is that read_starfish_conf behaves in a special way. Namely, the command read_starfish_conf will look for a file named starfish.conf in the current directory; if found, it will then look for the same named file in the parent directory. Again, if it is found, it will look into the parent of the parent directory and so on until it cannot find a file with that name, or until it reaches the top directory. After that, it will execute, or more precisely "require" in the Perl terminology, all found files starfish.conf from top to bottom. Each file is executed in its own directory as the current directory. This provides for a hierarchical per-directory configuration. A similar process is used sometimes in the system of Makefiles in a project with multiple directories, and in the Imake system for Makefile generation.

Replace Mode

Finally, if we want to produce a version of Java code without preprocessing code, we can use the Starfish replace mode. In this mode, the preprocessing code is removed as well as markup around the generated code. We must specify an output file in the replace mode because we normally do not want to permanently loose the preprocessing code. For example, if we run the following command:
 $ starfish -replace -o=release/simple.java simple.java
on the above file in which $Version variable is set to the value "Release", the resulting file release/simple.java would contain the following contents:
/**
   A simple Java file.
*/
// Uncomment version:



public class simple {

  public static int main(String[] args) {

        System.out.println("Release version");

    return 0;
  }
}

Conclusion

This is the end of the example. Starfish can be found as the Text::Starfish Perl module in CPAN, and this is the main web site.
created: 2020-06-02, last update: 2020-06-13, me comments

© 2020-2023 Vlado Keselj, last update: 14-Feb-2022