A reader for Portable Anymap (PNM) files

This interface provides a simple abstraction for reading portable pixmaps, portable graymaps, and portable bitmaps. The formats for the files are documented in section 5 man pages for ppm, pgm, and pbm. Read the pages! (For example, try typing man 5 pgm.)

Each format describes a rectangular array of pixels: for a portable pixmap (ppm), each pixel is represented by three numbers giving the red, green, and blue intensities of the pixel; for a portable graymap (pgm), each pixel is represented by a single number giving the light intensity of the pixel, where higher intensities are whiter; for a portable bitmap (pgm), each pixel is represented by a zero or a one, where 1 is black.

In a pixmap or graymap, each number is represented as a scaled integer (fraction). Each pixel has its own numerator, but they all share the same denominator, which appears in the file's header. In the PNM documentation, this denominator is called maxval. Why? Because a brightness is a real number between 0 and 1, if the brightness is represented as a fraction (scaled integer), then the largest possible numerator is equal to the denominator. Hence that shared denominator is also the maximum value of any numerator.

The PNM documentation describes two different file formats: binary and ASCII text representations of the pixel values. The latter format is described as ``rare,'' but it permits you to hand edit an image file, which can be useful for testing. This package will read either format.

A reader is created from a file, and the reader tells a client the size of the image, the type of each pixel, and the denominator used to scale the representations of intensity in the image. The reader also gives the client the opportunity to read the unsigned integer numerator(s) for each pixel.

Interface

<pnmrdr.h>=
#ifndef PNMRDR_INCLUDED
#define PNMRDR_INCLUDED
#include "except.h"
#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define T Pnmrdr_T
typedef struct T *T;

<exported exceptions>
<exported structure and enumeration types>
<exported functions>
#undef T
#endif

To start reading an image, we need an open file handle of C type pointer-to-FILE (FILE *). If you're reading from standard input, the C library provides the file handle stdin. Otherwise the way to get an open file handle is with the standard library function fopen. You should open an image file for binary reading, i.e., with mode "rb".

Just because we have an open file handle doesn't mean we actually have an image---the data in the file might not be in the right format. To signal this case we define the exception Pnmrdr_Badformat. (This interface uses the exception mechanism described in Chapter 4 of David Hanson's book C Interfaces and Implementations.)

<exported exceptions>= (<-U) [D->]
extern const Except_T Pnmrdr_Badformat;   /* raised when not a pnm file   */
<exported functions>= (<-U) [D->]
extern T Pnmrdr_new(FILE *fp);            /* raises Pnmrdr_Badformat      */

It is an unchecked run-time error to call Pnmrdr_new twice on the same file handle.

Function Pnmrdr_new reads the header and gets information about the the type and size of the map, which is represented as follows:

<exported structure and enumeration types>= (<-U)
typedef enum { Pnmrdr_bit = 1, Pnmrdr_gray = 2, Pnmrdr_rgb = 3 } 
        Pnmrdr_maptype;

typedef struct {
        Pnmrdr_maptype type;
        unsigned width, height;
        unsigned denominator;     /* (gray & color) used to scale integers  */
                                  /* to be read                             */
} Pnmrdr_mapdata;

extern char *Pnmrdr_maptype_names[];

<pnmrdr.c>= [D->]
char *Pnmrdr_maptype_names[] = {
        "invalid map type 0", "bitmap", "graymap", "pixmap"
};

You can query an open reader about its data:

<exported functions>+= (<-U) [<-D->]
extern Pnmrdr_mapdata Pnmrdr_data(T rdr);

Once the reader is open, we can read the integer numerators (or bits) from the file. For a bitmap or graymap, clients should read one integer per pixel; for a pixmap, clients should read three integers per pixel. The integers are read as described in the man pages: they are in ``English reading order,'' which is to say that the top left pixel is read first, followed by the remaining pixels in that row, followed by the next row, left to right, and so on.

If a client tries to read more integers than are there, the software raises the exception Pnmrdr_Count; that is, this exception is raised either when a client reads beyond the last expected integer or when an image file contains too few integers, and the client reads beyond the last integer in the file.

<exported functions>+= (<-U) [<-D->]
extern unsigned Pnmrdr_get(T rdr);  /* raises Pnmrdr_Count if exhausted     */
<exported exceptions>+= (<-U) [<-D]
extern const Except_T Pnmrdr_Count; /* raised when wrong number of          */
                                    /* pixels read                          */

Finally, when the reader is freed, Pnmrdr_free raises Pnmrdr_Count if only some of the available pixels have been read.

<exported functions>+= (<-U) [<-D]
extern void Pnmrdr_free(T *rdr);    /* raises Pnmrdr_Count unless either    */
                                    /* no pixels or all pixels were read    */

Architecture and implementation strategy

Representation is the essence of programming. We represent a reader as a pointer to a structure containing

Reading uses the standard C library functions fopen, fscanf, and getc.

Implementation

<pnmrdr.c>+= [<-D]
#include <stdio.h>
#include <stdbool.h>
#include "assert.h"
#include "mem.h"
#include "pnmrdr.h"

#define T Pnmrdr_T
struct T {
        Pnmrdr_mapdata data;
        FILE *source;
        const char *plain_format;     /* for a plain format, format
                                       * string used to read a pixel:
                                       * %1d for bitmaps; %u for others.
                                       * for a raw format, NULL.
                                       */
        unsigned (*read)(T);          /* read next integer                  */
        unsigned ints_left;
        bool     got_an_int;
        unsigned bits_left_in_row;    /* number of bits left in current row */
        unsigned char current_byte;   /* last byte read in raw format       */
        unsigned char next_bit_mask;  /* mask of next bit to be read 
                                       * (0 if all bits read)
                                       */
};

#include "stackoverflow.h"
<exceptions>
<private functions>
<functions>

Return of data is simple.

<functions>= (<-U) [D->]
Pnmrdr_mapdata Pnmrdr_data (T rdr) { return rdr->data; }

I decided not to use the I/O library that comes with the netpbm package. This decision is justified on two grounds:

For the plain format, by the time Pnmrdr_get is called the hard work has already been done. Half the code is error checking; the real work is calling fscanf and returning n.

<private functions>= (<-U) [D->]
static unsigned read_plain(T rdr)
{
        unsigned n;
        int rc = fscanf(rdr->source, rdr->plain_format, &n);

        if (rc != 1)
                RAISE(Pnmrdr_Badformat);
        return n;
}

The raw format is a bit more exciting. In particular, for a bitmap, we must read a byte at a time from the file, but Pnmrdr_get delivers just one bit at a time. Making this work uses shifting and bitwise operations that are covered in the middle of COMP 40.

<private functions>+= (<-U) [<-D->]
static unsigned read_raw_bit(T rdr) 
{
        if (rdr->bits_left_in_row == 0) {
                rdr->bits_left_in_row = rdr->data.width;
                rdr->next_bit_mask = 0;
        }
        if (rdr->next_bit_mask == 0) {
                int c = getc(rdr->source);
                if (c == EOF)
                        RAISE(Pnmrdr_Count);
                rdr->next_bit_mask = 1<<7;
                rdr->current_byte = c;
        }
        unsigned bit = (rdr->current_byte & rdr->next_bit_mask) != 0;
        rdr->bits_left_in_row--;
        rdr->next_bit_mask >>= 1;
        return bit;
}

A raw graymap or pixmap represents an integer either as a single byte or as a pair of bytes, where the first byte is more significant.

<private functions>+= (<-U) [<-D->]
static unsigned read_raw_char(T rdr)
{
        int c = getc(rdr->source);
        if (c == EOF)
                RAISE(Pnmrdr_Count);
        return c;
}

static unsigned read_raw_pair(T rdr)
{
        int hi = getc(rdr->source);
        int lo = hi == EOF ? EOF : getc(rdr->source);
        if (lo == EOF)
                RAISE(Pnmrdr_Count);
        return (hi << 8) + lo;
}

Function Pnmrdr_get uses the internal function pointer rdr->read, which is determined by the format. This function is alwyas read_plain, read_raw_bit, read_raw_char, or read_raw_pair.

<functions>+= (<-U) [<-D->]
unsigned Pnmrdr_get (T rdr)
{
        assert(rdr && rdr->source);
        if (rdr->ints_left > 0) {
                rdr->got_an_int = true;
                rdr->ints_left--;
                return rdr->read(rdr);
        } else {
                RAISE(Pnmrdr_Count);
                /* code not reached, but the compiler doesn't know that, so: */
                return 0;
        }
}

We always free the memory associated with a reader, but if the reader is freed with some pixels unread, this code raises the exception Pnmrdr_Count.

<functions>+= (<-U) [<-D->]
void Pnmrdr_free (T *prdr)
{
        unsigned leftover;
        assert(*prdr);
        leftover = (*prdr)->ints_left;
        bool got_an_int = (*prdr)->got_an_int;
        FREE(*prdr);
        if (leftover > 0 && got_an_int)
                RAISE(Pnmrdr_Count);
}

Function Pnmrdr_new determines the input format and the size of the image, and it establishes the proper read functions and invariants for Pnmrdr_get. All image formats have this structure in common:

magic white width white height white [denominator white]
(A bitmap has no denominator.) Moreover, a # character appearing before denominator (or width) marks the start of a comment, which ends at the next newline. This reader uses the auxiliary function read_token to get the next token while discarding comments.

<functions>+= (<-U) [<-D]
T Pnmrdr_new (FILE *fp) {
  <if this is the first call, install the stack-overflow detector>
  assert(fp);
  T rdr = ALLOC(sizeof(*rdr));
  TRY
    rdr->source = fp;

    /* read first token and check for a pbmplus 'magic number' */
    char token[20];   // used to read 'magic number'
    read_token(rdr->source, "%10s", token); /* 10 is much smaller than sizeof(token) */
    if (token[0] != 'P' || token[1] < '1' || token[1] > '6' || token[2] != '\0') {
      RAISE(Pnmrdr_Badformat);
      return NULL; // code not reached, but the compiler doesn't know that
    } else {
      /* we have a magic number; set type, integer format, and reader function */
      <use token[1] to set fields rdr->data.type and rdr->plain_format>
      /* read the rest of the header and return success */
      read_token(rdr->source, "%u", &rdr->data.width);
      read_token(rdr->source, "%u", &rdr->data.height);
      if (rdr->data.type == Pnmrdr_bit)
        rdr->data.denominator = 1;
      else
        read_token(rdr->source, "%u", &rdr->data.denominator);
      int c = getc(fp); // single whitespace or newline following denominator
      if (!isspace(c))
        RAISE(Pnmrdr_Badformat);
      <use token[1] to and rdr->data.denominator to set field rdr->read>
      rdr->ints_left = rdr->data.width * rdr->data.height;
      rdr->got_an_int = false;
      if (rdr->data.type == Pnmrdr_rgb)
        rdr->ints_left *= 3;
      RETURN rdr;
    }
  ELSE
    FREE(rdr);
    RERAISE;
    return NULL; // code not reached, but the compiler doesn't know that
  END_TRY;
}

Setup is different depending on whether the representation of the pixels is ``plain'' or ``raw.'' There are therefore three cases to consider:

Both plain and raw images come in three flavors: bitmap, graymap, and pixmap. The second character in token determines both the representation and the flavor.

<use token[1] to set fields rdr->data.type and rdr->plain_format>= (<-U)
switch (token[1]) {
  case '1': rdr->data.type = Pnmrdr_bit;  rdr->plain_format = "%1d"; break;
  case '2': rdr->data.type = Pnmrdr_gray; rdr->plain_format = "%u";  break;
  case '3': rdr->data.type = Pnmrdr_rgb;  rdr->plain_format = "%u";  break;
  case '4': rdr->data.type = Pnmrdr_bit;  rdr->plain_format = NULL;  break;
  case '5': rdr->data.type = Pnmrdr_gray; rdr->plain_format = NULL;  break;
  case '6': rdr->data.type = Pnmrdr_rgb;  rdr->plain_format = NULL;  break;
  default : assert(0); break;
}

The format also determines which function is used to read the next integer.

<use token[1] to and rdr->data.denominator to set field rdr->read>= (<-U)
switch(token[1]) {
  case '1': case '2': case '3':
    rdr->read = read_plain;
    break;
  case '4':
    rdr->bits_left_in_row = 0; // trigger read of next byte
    rdr->read = read_raw_bit;
    break;
  case '5': case '6':
    rdr->read = rdr->data.denominator > 255 ? read_raw_pair : read_raw_char;
    break;
}

<exceptions>= (<-U)
const Except_T Pnmrdr_Count = { "Read too many or too few integers in pnm map" };
const Except_T Pnmrdr_Badformat = { "Input is not a correctly formatted pnm map" };

Function read_token peeks ahead in the input and scans past any comments, then calls the standard library function fscanf.

<private functions>+= (<-U) [<-D]
static void read_token(FILE *source, const char *fmt, void *address) 
{
        int rc;
        <scan past comments in file source>
                  rc = fscanf(source, fmt, address);
        if (rc != 1)
                RAISE(Pnmrdr_Badformat);
}

This code is a very simple lexical scanner, probably intelligble only to a compiler writer. It scans past whitespace and looks for the '#' character that would mark the start of a comment. Eventually it reaches end of file (bad news) or a non-space, non-'#' character, in which case it sneakily puts back the character using ungetc and exits the loop.

<scan past comments in file source>= (<-U)
{ int c;
  for (;;) { /* skip comments */
    do { c = getc(source); } while (c != EOF && isspace(c)); /* skip whitespace */
    if (c == '#') { /* scan to newline */
      do { c = getc(source); } while (c != EOF && c != '\n');
      if (c == EOF)
        RAISE(Pnmrdr_Badformat);
      /* otherwise continue the for loop */
    } else if (c == EOF) {
      RAISE(Pnmrdr_Badformat);
    } else { /* put back the non-space, non-comment character c & we're done */
      ungetc(c, source); 
      break; 
    }
  }
}

I'm careful to ensure that ungetc is not called twice consecutively on the file; to ``unget'' more than one character would be an unchecked run-time error!

This code helps detect stack overflow using the GNU library libsigsegv.

<if this is the first call, install the stack-overflow detector>= (<-U)
do {
  static int initialized;
  if (!initialized) {
    initialized = 1;
    report_stack_overflow();
  }
} while(0);