Development of modules for nginx
Conference HighLoad + + 2008
Valery Kholodkov
Translation by Antoine Bonavita with some help from google translate and Yahoo babelfish.
Please treat this translation as "beta" as it's being reviewed by the author.
Original version in russian.

0. Introduction

Material you are reading right now is based on my own original research material and the nginx mailing list.

Since I am not the author of nginx and did not participate in the discussions of implementation details, data may not be correct at 100%.

You have been warned!

This material is organized as follows:

1. Asynchronous servers

The main feature of nginx implementation is that all network I/O operations are performed asynchronously by worker processes. This gives the following advantages:

The counter-part to this is that nginx programming requires a special approach: you cannot use blocking I/O. As a consequence, the use of most libraries and code samples is forbidden.

At the beginning of each cycle, nginx worker processes execute a special system call which polls on all sockets for an event. An event can be one of:

For a cycle to complete, the trigerring event must be processed: for a connection request, a connection must be established; for data available in the input buffer, data must be retrieved and processed; for space available on the output buffer, data must be added to the socket's output buffer. All these events are handled using non-blocking operations.

When processing a socket event, a working process does not process other sockets events. As a consequence, the longer the processing of an event takes, the "hungrier" other sockets feel: incoming data accumulates into input buffers, output buffers become empty. On the client side, this looks like a "hang". Therefore, to prevent socket starvation, server components must be implemented using the following principles:

The UNIX disk I/O is always blocking (except for the POSIX realtime API). As a consequence of the principles above, they should be avoided. However, in practice, due to caching and read-ahead, blocking can be reduced to a minimum.

Due to the implementation constraints described above, full-scale web applications are difficult to implement using nginx API. Therefore, this API should only be used to develop nginx modules.

2. nginx API

2.1. Memory management

In nginx, memory management is implemented using memory pools. A pool is a sequence of pre-allocated blocks on the heap. A pool is tied to an object (e.g. a request or an event) that determines the lifetime of objects allocated in the pool.

Pools are used for the following reasons:

To allocate memory, use the following functions:

void *ngx_palloc(ngx_pool_t *pool, size_t size);
void *ngx_pcalloc(ngx_pool_t *pool, size_t size);

pool - the pool from which memory is allocated;
size - size of memory to allocate in bytes;
result - a pointer to the allocated memory or NULL, if unable to allocate.

The ngx_pcalloc function fills the allocated memory with zeros (ngx_palloc does not).

In the unlikely event where you need to release memory, use the function ngx_pfree:

ngx_int_t ngx_pfree(ngx_pool_t *pool, void *p);

pool - freed space will be returned to this pool;
p - the pointer to memory that is being released;
result - NGX_OK if memory was actually freed, NGX_DECLINED, if nginx did not free any memory from the heap.

To register a destructor (for closing file handles, or deleting files), you should use the following structures and functions:

typedef struct ngx_pool_cleanup_s ngx_pool_cleanup_t;
ngx_pool_cleanup_s {
    ngx_pool_cleanup_pt handler;
    void *data;
    ngx_pool_cleanup_t *next;
};

ngx_pool_cleanup_t *ngx_pool_cleanup_add(ngx_pool_t *p, size_t size);

p - the pool the destructor should be registered with;
size - size of context structure, which will be transferred to the destructor;
result - a pointer to the allocated ngx_pool_cleanup_t structure if its creation was successful; NULL otherwise.

Fields in the ngx_pool_cleanup_t are filled by the call to ngx_pool_cleanup_add and have the following meanings:
handler - the handler to be called when the pool is deleted;
data - field allocated with the specified size and intended for use as context to be passed to the destructor;
next - points to the next destructor in the chain. Caller should treat this as read-only.

Example of use:

static void ngx_sample_cleanup_handler (void * data);

static ngx_int_t ngx_http_sample_module_handler(ngx_http_request_t *r)
{
    ngx_pool_cleanup_t    *cln;
    ngx_sample_cleanup_t  *scln;

    cln = ngx_pool_cleanup_add(r->pool, sizeof(ngx_sample_cleanup_t));

    if(cln == NULL)
        return NGX_ERROR;

    //  Actually set the handler.
    cln->handler = ngx_sample_cleanup_handler;

    scln = cln->data;

    [... initialize scln ...]
}

static void ngx_sample_cleanup_handler(void *data)
{
    //  Retrieve the scln we allocated in the previous function.
    ngx_sample_cleanup_t        *scln = data;

    [... use scln ...]
}

2.2. Arrays

Arrays in nginx are represented by the following structure:

struct ngx_array_s {
    void        *elts;
    ngx_uint_t   nelts;
    size_t       size;
    ngx_uint_t   nalloc;
    ngx_pool_t  *pool;
};

typedef struct ngx_array_s ngx_array_t;

pool - the memory pool from which elements will be allocated;
elts - a pointer to the elements;
nelts - the number of elements in the array at the moment;
size - the size in bytes of each element;
nalloc - the number of elements memory has been allocated for at this time.

2.2.1. ngx_array_create

To create an array, you should use ngx_array_create:

ngx_array_t *ngx_array_create(ngx_pool_t *p, ngx_uint_t n, size_t size);

p - pool, used to allocate memory for the array and its elements;
n - number of elements for which memory should be allocated;
size - the size in bytes of each element;
result - a pointer to a newly allocated array, NULL if allocation failed.

Example:

typedef struct {
    [...]
} ngx_sample_struct_t;

{
    ngx_array_t *v;

    v = ngx_array_create(pool, 10, sizeof(ngx_sample_struct_t));

    if (v == NULL) {
        return NGX_ERROR;
    }
}

This code creates an array of 10 ngx_sample_struct_t. The memory for storing the elements is allocated in the pool but the array is still empty (nelts is 0).

2.2.2. ngx_array_init

If the memory for the ngx_array_t is already allocated and you just want to allocate memory for its elements, you should use ngx_array_init:

ngx_int_t ngx_array_init(ngx_array_t *array, ngx_pool_t *pool, ngx_uint_t n, size_t size);

array - pointer to the already allocated ngx_array_t structure
p - pool, used to allocate memory for the elements;
n - number of elements to allocate memory for;
size - the size in bytes of each element;
result - NGX_OK, if allocation was successful; NGX_ERROR otherwise.

Example:

typedef struct {
    ngx_array_t v;
    [...]
} ngx_sample_envelope_t;

typedef struct {
    [...]
} ngx_sample_item_t;

{
    ngx_sample_envelope_t t; // Allocates memory for t.v on the stack.

    if (ngx_array_init(&t.v, pool, 10, sizeof(ngx_sample_item_t)) != NGX_OK) {
        return NGX_ERROR;
    }
}

2.2.3. ngx_array_push*

To add elements at the end of the array, you should use the functions ngx_array_push and ngx_array_push_n:

void *ngx_array_push(ngx_array_t *a);
void *ngx_array_push_n(ngx_array_t *a, ngx_uint_t n);

a - the array to which elements will be added;
n - number of elements to be allocated (one in the case of ngx_array_push);
result - a pointer to the first added element; NULL if allocation failed.

If there is enough space allocated, these functions will just increase the nelts field of the array. If there is not enough space, they will try to grow the space available to elements using contiguous space in the pool. If no such space is available, new space will be allocated (doubling the size) and data will be copied. As a consequence: 1/ it is not safe to access elements without going through the elts field of an array, 2/ it is always a good idea to correctly size your arrays in order to avoid the performance penalty of reallocation+copy.

Example:

typedef struct {
    [...]
} ngx_sample_struct_t;

{
    ngx_array_t *v;
    
    [...create array...]

    h = ngx_array_push(v);
    if (h == NULL) {
        return NGX_ERROR;
    }
    
    [...use h...]
}

2.2.4. ngx_array_destroy

To destroy the array (return its memory to the pool), you should use ngx_array_destroy. The memory returned to the pool will actually be available for reuse only if it was at the end of the pool.

void ngx_array_destroy(ngx_array_t *a);

a - the array to be destroyed;

2.3. Lists

See the original Catap's blog post on nginx lists (in russian)

2.3.1 Overview

In nginx lists are implemented using this structure:

typedef struct {
    ngx_list_part_t  *last;
    ngx_list_part_t   part;
    size_t            size;
    ngx_uint_t        nalloc;
    ngx_pool_t       *pool;
} ngx_list_t;

last - a pointer to the last part of the list;
part - the first part of the list;
size - the size of list elements. If you need to store strings, then the size will be equal to sizeof(ngx_str_t)
nalloc - the number of elements to be allocated for each part.
pool - a pointer to a memory pool used for allocation of objects in this list.

A list is made of many parts (each part is a dedicated block of memory allocated to store a certain number of list elements):

typedef struct ngx_list_part_s  ngx_list_part_t;
struct ngx_list_part_s {
    void             *elts;
    ngx_uint_t        nelts;
    ngx_list_part_t  *next;
};

elts - pointer to the memory where the elements of the list stored in this part are located.
nelts - the number of elements actually stored in this part
next - pointer to the next part

So, a list is made of parts (minimum 1), each part actually storing the elements. A part has a length (number of elements it holds) and a pointer to the next part in the list (or NULL if it still has space available to receive new elements). All parts have the same size (they can hold the same number of elements - namely ngx_list_t.nalloc).

2.3.2 Working with lists

2.3.2.1 Creation

Lists are created using:

ngx_list_t *ngx_list_create(ngx_pool_t *pool, ngx_uint_t n, size_t size);

size - the size of one element,
n - number of elements each part can hold,
pool - a pool used to allocate memory for the list
result - a pointer to the newly created list or NULL if allocation failed

The function creates a new list and initializes it: it allocates memory for itself but also for the first part of the list and the elements this first part will hold.

2.3.2.2 Initialization

If the memory for the ngx_list_t is already allocated (typically on the stack) and you just want to allocate memory for its elements, you should use ngx_list_init:

static ngx_inline ngx_int_t ngx_list_init(ngx_list_t *list, ngx_pool_t *pool, ngx_uint_t n, size_t size)

Its parameters are similar to the parameters of ngx_list_create, except for the first which is a pointer to the list that should be initialized (or reinitialized). The function returns NGX_OK in case of success and NGX_ERROR otherwise.

The function is not safe: if you use it to reinitialize any existing list, it will initialize it and as a consequence lose all the old data.

2.3.2.3 Adding elements

Elements are added using:

void *ngx_list_push(ngx_list_t *list);

list - a pointer to the list where the item is to be added;
result - a pointer to the added element (actually a pointer to a location in the last part where you can record). It's up to the caller to cast this to the appropriate type.

If the last part is full, a new one is created and the function returns a pointer to the first element of the newly created part.

2.3.2.4 Browsing the list

To navigate through the list you can use this code:

    part = &list.part;
    data = part->elts;
    for (i = 0 ;;i++) {
        if (i >= part->nelts) {
            if (part->next == NULL) { break; }
            part = part->next;
            data = part->elts;
            i = 0;
        }
        /* use element of list as data[i] */
    }

Replace the comment with your own code, using data[i] as the value of the current element in the list.

2.3.2.5 Example

This function fills a list with integer 0 to 9 and sums them:

void ngx_list_sample_using(ngx_http_request_t *r) {
    ngx_list_t       list;
    ngx_uint_t       i;
    ngx_list_part_t *part;
    ngx_uint_t      *list_element;
    ngx_uint_t      *sum = 0;
    /* inits list (as opposed to create it) as it is allocated on stack. */
    if (ngx_list_init(&list, r->pool, 5, sizeof(ngx_uint_t)) == NGX_ERROR) { return; }
    /* fills list with values 0 to 9 */
    for (i = 0; i < 10; i++) {
        /* add one element to list */
        list_element = ngx_list_push(&list);
        if (list_element == NULL) { return; }
        /* sets element to i */
        *list_element = i;
    }
    part = &list.part;
    data = part->elts;
    /* loops through all elements as seen before */
    for (i = 0 ;;i++) {
        if (i >= part->nelts) {
            if (part->next == NULL) { break; }
            part = part->next;
            data = part->elts;
            i = 0;
        }
        /* adds current element to sum */
        sum += data[i];
    }
    /* here sum is 45 */
}

2.4 Buffer management and queuing

2.4.1 Buffer Management

Buffers are used to track the progress of reception, transmission and processing of data.

The ngx_buf.h file defines the ngx_buf_t type. These buffers can correspond to memory (read-only or not) or to a file.

For writable in-memory buffers, the following fields are of interest:

typedef struct ngx_buf_s ngx_buf_t;
struct ngx_buf_s {
    [...]

    u_char          *pos;    /* top of window */
    u_char          *last;   /* end of window */
    u_char          *start;  /* beginning of buffer */
    u_char          *end;    /* end of buffer */

    unsigned         temporary:1; /* = 1 */

    [...]
};

For read-only in-memory buffers, the fields of interest are:

struct ngx_buf_s {
    [...]

    u_char          *pos;    /* top of window */
    u_char          *last;   /* end of window */
    u_char          *start;  /* beginning of buffer */
    u_char          *end;    /* end of buffer */

    unsigned         memory:1; /* = 1 */

    [...]
};

For the file:

struct ngx_buf_s {
    [...]

    off_t            file_pos;  /* top of window */
    off_t            file_last; /* end of window */
    ngx_file_t      *file;      /* file pointer */

    unsigned         in_file:1; /* = 1 */

    [...]
};

The concept of window defines a part of the buffer that is still to be sent or processed or that has alerady been received. When filling the buffer, the last pointer is moved toward the end. When sending or processing the buffer, the pos pointer is moved toward the last (or file_pos toward file_last in the case of file). When all data is sent or processed, then pos == last (or file_pos == file_last for files). When the buffer is empty, then pos == last == start.

In addition, the buffer contains flags that describe how to process the data contained in the buffer:

struct ngx_buf_s {
    [...]

    unsigned         recycled:1; /* buffer is reused after release */
    unsigned         flush:1; /* all buffered data should be processed and transmitted to the next level after treatment of the buffer */
    unsigned         last_buf:1; /* indicates that the buffer is the last in a stream of data */
    unsigned         last_in_chain:1; /* indicates that the buffer is the last in this chain (queue) */
    unsigned         temp_file:1; /* indicates that the buffer is a temporary file */

    [...]
};

In the case of read-only memory, many ngx_buf_t may point to the same underlying data (for example a segment of constant data or the configuration of a module). The window of each structure then indicates the progress of sending/processing the data.

ngx_buf_t structures should be created using the macros:

ngx_buf_t *ngx_alloc_buf(ngx_pool_t *pool);
ngx_buf_t *ngx_calloc_buf(ngx_pool_t *pool);

pool - pool, used to allocate memory for the ngx_buf_t;
result - pointer to the allocated ngx_buf_t if successful and NULL in case of failure. After allocation, the members of the structure still have to be initialized. ngx_calloc_buf also sets the allocated memory to zeroes.

Temporary buffers can be created with:

ngx_buf_t *ngx_create_temp_buf(ngx_pool_t *pool, size_t size);

pool - pool used to allocate memory for the ngx_buf_t structure and the buffer itself;
size - size (in bytes) of the buffer to allocate. For the created buffer, end = start + size.
result - a pointer to a ngx_buf_t structure pointing to the temporary buffer or NULL in case of allocation failure.

The ngx_buf_t returned will have pos = last = start and temporary will be set to 1.

2.4.2 Queue management

Queue (or chains) bind multiple buffers in a sequence that determines the order in which data will be received, processed and sent.

struct ngx_chain_s {
    ngx_buf_t    *buf; /* buffer associated with the current link */
    ngx_chain_t  *next; /* next link in the chain */
};

typedef struct ngx_chain_s ngx_chain_t;

Chains are created using the following structures and functions:

typedef struct {
    ngx_int_t    num;
    size_t       size;
} ngx_bufs_t;

ngx_chain_t *ngx_alloc_chain_link(ngx_pool_t *pool);
ngx_chain_t *ngx_create_chain_of_bufs(ngx_pool_t *pool, ngx_bufs_t *bufs);
ngx_chain_t *ngx_chain_get_free_buf(ngx_pool_t *p, ngx_chain_t **free);

ngx_alloc_chain_link allocates memory for one link from the pool. If there is one link available on the pool, returns it and move the pool to the next link. If there is none available, allocate memory for it on the pool and return it.

ngx_create_chain_of_bufs allocates memory for a chain of links and buffers. It allocates bufs->num buffers of bufs->size bytes each. Each allocated buffer is bound to a link of the chain.

ngx_chain_get_free_buf returns the first free buffer (as provided by *free, a pointer to a chain of free buffers) if there is one. In this case, the returned link is removed from the list of free buffers. If there is no free buffer, the function allocates one link and one buffer from the pool and returns it (the link).

pool - the pool from which is (are) allocated the links/buffers if necessary;
bufs - structure with the number of buffers to allocate and the size (in bytes) of each buffer;
free - a pointer to a pointer of a chain of free buffers;
result - a pointer to a structure ngx_chain_t in case of success, NULL otherwise.

To release a link, use:

ngx_free_chain(ngx_pool_t *pool, ngx_chain_t *cl);

pool - the pool to which the link is returned;
cl - the link to release.

Queue management is performed using ngx_chain_add_copy and ngx_chain_update_chains.

ngx_int_t ngx_chain_add_copy(ngx_pool_t *pool, ngx_chain_t **chain, ngx_chain_t *in);

pool - the memory pool used to allocate new chain links.
chain - the chain to which the new links will be added (at the end of the chain).
in - the chain to copy.
result - NGX_OK in case of success, NGX_ERROR in case of memory allocation failure.

The function performs a copy of each link from in and appends the copy to chain. The buffer thewselves are not copied: each new link maintains a reference to the buffer of the original old link.

void ngx_chain_update_chains(ngx_chain_t **free, ngx_chain_t **busy, ngx_chain_t **out, ngx_buf_tag_t tag);

ngx_chain_update_chains function adds all processed or sent buffers with the tag tag from the chain out and busy to the chain free; the rest adds to the chain busy

free - a chain of free buffers. Potentially filled by this function.
out - a chain of buffers to output. The buffers in this chain are added to the busy chain.
busy - a chain of busy buffers. The first buffers in this chain that are tagged with tag are moved to the free chain. This returning of untagged buffers stops at the first buffer that is actually busy (window !=0).
tag - a tag used to identify certains buffers as candidates for freedom.

When a buffer is moved from busy to free, its window is reset to be empty and to point at the beginning of the buffer.

This function algorithm goes something like this:

2.5 Strings

In nginx strings are stored in a Pascal-like way in order to avoid the overhead of calculating the length, as well as copying in some situations.

typedef struct {
    size_t      len;
    u_char     *data;
} ngx_str_t;

len - length of string in bytes,
data - a pointer to memory containing the string.

2.6 Variables

Variables are named data containers that can be converted to or from a string. Variable values can be of any type.

The conversion to/from string is managed by these setters/getters:

typedef void (* ngx_http_set_variable_pt) (ngx_http_request_t * r,
                                           ngx_http_variable_value_t * v,
                                           uintptr_t data);

typedef ngx_int_t (*ngx_http_get_variable_pt) (ngx_http_request_t *r,
                                               ngx_http_variable_value_t * v,
                                               uintptr_t data);

ngx_http_set_variable_pt - the type of functions called to set a variable value;
ngx_http_get_variable_pt - the type of functions called to get a variable value;
r - the request where the variable value will actually be stored;
v - the value to get/set;
data - the offset in the ngx_http_request_t structure where is located the variable to get/set;

The variable itself is described by this structure:

struct ngx_http_variable_s {
    ngx_str_t                     name;
    ngx_http_set_variable_pt      set_handler;
    ngx_http_get_variable_pt      get_handler;
    uintptr_t                     data;
    ngx_uint_t                    flags;

    [...]
};

name - name of the variable;
set_handler - function to use to set values for this variable;
get_handler - function to use to get values for this variable;
data - offset indicating where the variable data is located in the including structure.
flags - flags:

2.6.1 Adding a variable

To add a new variable, use:

ngx_http_variable_t *ngx_http_add_variable(ngx_conf_t *cf, ngx_str_t *name, ngx_uint_t flags);

cf - a module configuration to which the variable will be added;
name - name of the variable to add;
flags - flags defining the type of variable (see above)
result - a pointer to the added ngx_http_variable_t structure.

Example:

static ngx_str_t  ngx_http_var_name = ngx_string("var");
[...]
{
    ngx_http_variable_t  *var;

    var = ngx_http_add_variable(cf, &ngx_http_var_name, NGX_HTTP_VAR_NOCACHEABLE);
}

2.6.2 Retrieving the index of a variable

ngx_int_t ngx_http_get_variable_index(ngx_conf_t *cf, ngx_str_t *name);

cf - a configuration where the variable is defined,
name - name of the variable you are looking for
result - the index of the variable in the array of variables defined for this configuration.

2.6.3 Getting the string value of a variable

typedef struct {
    unsigned    len:28;

    unsigned    valid:1;
    unsigned    no_cacheable:1;
    unsigned    not_found:1;

    [...]

    u_char     *data;
} ngx_variable_value_t;

typedef ngx_variable_value_t  ngx_http_variable_value_t;

ngx_http_variable_value_t *ngx_http_get_indexed_variable(ngx_http_request_t *r,
                                                         ngx_uint_t index);

r - request. The value of the variable will be extracted in the context of this request;
index - the index of the variable as returned by ngx_http_get_variable_index;
result - the value of a variable as a pointer to a structure ngx_http_variable_value_t (synonym ngx_variable_value_t).

Fields in the ngx_variable_value_t structure mean:

2.7 Scripts

Scripts in nginx are essentially pieces of byte-code used to generate strings. A script can be created (or compiled) from a template, then used (or run) any number of times. Templates are strings including references to variables using the syntax $variable_name or ${variable_name} where variable_name can be a symbolic name (see example below) or a positional parameter (i.e. 0 to 9). The variables $0 to $9 are set by the rewrite module (see nginx rewrite module page for more).

When run the script uses the current values of variables (or the value at the time it entered the cache if caching is authorized for this variable).

Templates are compiled into scripts using the ngx_http_script_compile function and this structure:

typedef struct {
    ngx_conf_t                 *cf; /* pointer to a configuration */
    ngx_str_t                  *source /* template to be compiled */;

    ngx_array_t               **lengths; /* byte-code to determine the
                                              length of the result */
    ngx_array_t               **values; /* byte-code to generate the result */
    
    ngx_uint_t                  variables; /*  The expected number of variables
                                              in the template. */

    unsigned                    complete_lengths:1; /* Should byte-code to
                                          determine the length be generated */
    unsigned                    complete_values:1; /* Should byte-code to
                                          determine the values be generated */
} ngx_http_script_compile_t;

Example:

static ngx_str_t ngx_http_script_source = ngx_string("Your IP-address is $remote_addr");

{
    ngx_http_script_compile_t   sc;
    ngx_array_t                 *lengths = NULL;
    ngx_array_t                 *values = NULL;

    ngx_memzero(&sc, sizeof(ngx_http_script_compile_t));

    sc.cf = cf;
    sc.source = &ngx_http_script_source;
    sc.lengths = &lengths;
    sc.values = &values;
    sc.variables = 1;
    sc.complete_lengths = 1;
    sc.complete_values = 1;

    if (ngx_http_script_compile(&sc) != NGX_OK) {
        return NGX_CONF_ERROR;
    }

    return NGX_CONF_OK;
}

To run the script, use the function ngx_http_script_run:

u_char *ngx_http_script_run(ngx_http_request_t *r, ngx_str_t *value,
                            void *code_lengths, size_t reserved,
                            void *code_values);

r - the request, in the context of which the script is to be run,
value - a pointer to a string that will receive the result of running the script.
code_lengths - a pointer to the code to get the length of the result,
reserved - reserved argument,
code_values - a pointer to the codes producing the values,
result - a pointer to the byte of memory following the last byte of the result, or NULL, if an error occurred.

Example:

[...]
{
    ngx_str_t value;

    if (ngx_http_script_run(r, &value, lengths->elts, 0,
                            values->elts) == NULL)
    {
        return NGX_ERROR;
    }

    [...]
}

If the number of variables in the pattern is unknown, you can use this function to count them:

ngx_uint_t ngx_http_script_variables_count(ngx_str_t *value);

value - pointer to the template string with the variables to count
result - the number of variables in the template string.

Example:

static ngx_str_t ngx_http_script_source = ngx_string("Your IP-address is $remote_addr");
{
    ngx_int_t                   n;
    ngx_http_script_compile_t   sc;
    ngx_array_t                 *lengths = NULL;
    ngx_array_t                 *values = NULL;

    n = ngx_http_script_variables_count(&ngx_http_script_source);

    if(n > 0) {
        ngx_memzero(&sc, sizeof(ngx_http_script_compile_t));

        sc.cf = cf;
        sc.source = &ngx_http_script_source;
        sc.lengths = &lengths;
        sc.values = &values;
        sc.variables = n;
        sc.complete_lengths = 1;
        sc.complete_values = 1;

        if (ngx_http_script_compile(&sc) != NGX_OK) {
            return NGX_CONF_ERROR;
        }
    }

    return NGX_CONF_OK;
}

If the template does not contain variables, you can save a call to ngx_http_script_run but you have to make sure that the vectors containing byte-code are not initialized as shown below:

[...]
{
    ngx_str_t value;

    if (lengths == NULL) {
        value.data = ngx_http_script_source.data;
        value.len = ngx_http_script_source.len;
    } else {
        if (ngx_http_script_run(r, &value, lengths->elts, 0,
            values->elts) == NULL)
        {
            return NGX_ERROR;
        }
    }

    [...]
}

2.8. Regular Expressions

Regular expressions in nginx are implemented on top of the PCRE library and are available only when this library is made available by the system. In the source code, code using regular expressions is protected with the NGX_PCRE macro.

Regular expressions are similar to scripts: they are compiled once then executed many times. To compile a regular expression, use ngx_regex_compile:

#if (NGX_PCRE)
typedef pcre  ngx_regex_t;

ngx_regex_t *ngx_regex_compile(ngx_str_t *pattern, ngx_int_t options,
                               ngx_pool_t *pool, ngx_str_t *err);
#endif

pattern - a pointer to a string containing the regular expression;
options - the option flags as epxected by the pcre_compile function.
pool - the pool, used to allocate memory for the regular expression;
err - a string containing a textual description of the error (if any) that occurred when compiling the regular expression;
result - a pointer to a structure containing the compiled regex.

The most recent versions of nginx (at least >= 0.8.53) use a ngx_regex_compile_t structure as input/output to the function ngx_regex_compile but principles are very similar.

The options parameter can contain a flag NGX_REGEX_CASELESS, to make the regular expression case insensitive.

Regular expression compilation example

ngx_regex_compile_t   rc;
u_char                errstr[NGX_MAX_CONF_ERRSTR];

ngx_memzero(&rc, sizeof(ngx_regex_compile_t));

rc.pool = cf->pool;
rc.err.len = NGX_MAX_CONF_ERRSTR;
rc.err.data = errstr;
rc.pattern = ngx_string("^(\d+)\.(\d+)\.(\d+)\.(\d+)$");
rc.options = NGX_REGEX_CASELESS;

if (ngx_regex_compile(&rc) != NGX_OK) {
    return NGX_CONF_ERROR;
}

There was no example in the original documentation (although there was a reference to one). So I decided to add this one (and make it compatible with version 0.8.53 of nginx).

To calculate the number of capturing subpatterns in the regular expression use the function ngx_regex_capture_count:

#if (NGX_PCRE)
ngx_int_t ngx_regex_capture_count(ngx_regex_t *re);
#endif

re - a pointer to the compiled regular expression;
result - the number of capturing subpatterns.

To execute a regular expression, use the function ngx_regex_exec:

#if (NGX_PCRE)
ngx_int_t ngx_regex_exec(ngx_regex_t *re, ngx_str_t *s, int *captures,
                         ngx_int_t size);
#endif

re - a pointer to the compiled regular expression;
s - a pointer to a string, against which the regular expression will be executed;
captures - an array of integers that will be set to identify the captured strings in the s string (see below).
size - number of elements in the array captures
result - 0 if the regular expression matched, NGX_REGEX_NO_MATCHED if the regular expression did not match, less than NGX_REGEX_NO_MATCHED if an error occurred.

The number of elements of the array captures must be exactly three times the number of capturing subpatterns in the regular expression. The first two thirds of the vector contain the start and end positions of the captured substrings, the remaining third is used as workspace by the PCRE library while matching subpatterns. Each even element of the first two-thirds of the vector contains the position of the first character of the substring. Each odd element contains the position of the first character after the end of the captured substring.

2.9. Module configuration

The configuration of a module is stored at run-time in a structure defined by the developer of the module. Any HTTP request is associated with the configurations of three levels: main configuration, virtual server configuration and location configuration. At each level, the structure should keep only those configuration settings that are shared by all instances of configurations at lower levels. For example, the virtual server names and address of the listening socket are shared by all locations of this virtual server ; so it makes sense to store these settings in the configuration of the virtual server. To access the configuration of any level while parsing the configuration file, use the corresponding macro:

ngx_http_conf_get_module_main_conf(cf, module)
ngx_http_conf_get_module_srv_conf(cf, module)
ngx_http_conf_get_module_loc_conf(cf, module)

cf - pointer to the ngx_conf_t structure (configuration)
module - pointer to the ngx_module_t structure (that describes the module)
result - pointer to the module's configuration corresponding to the requested level.

To access the configuration of any level during request processing, use the corresponding macro:

ngx_http_get_module_main_conf(r, module)
ngx_http_get_module_srv_conf(r, module)
ngx_http_get_module_loc_conf(r, module)

r - a pointer to a ngx_http_request_t structure (containing the request being processed)
module - pointer to the ngx_module_t structure (that describes the module)
result - pointer to the module's configuration corresponding to the requested level.

2.10. Module context

During the processing of a HTTP request, the context of a module is stored in binary form in structures defined by the module developer. To set the context of a module for a given HTTP request, the module uses the following macro:

ngx_http_set_ctx(r, c, module)

r - a pointer to a ngx_http_request_t structure (containing the request being processed),
c - pointer to the context of the module (a structure defined by the developer),
module - pointer to the ngx_module_t structure of the module

To retrieve the context of the module for the request, the module uses the following macro:

ngx_http_get_module_ctx(r, module)

r - a pointer to a ngx_http_request_t structure (containing the request being processed),
module - pointer to the ngx_module_t structure of the module
result - a pointer to the context of the module.

3. nginx and modules

3.1. Phases of request processing in nginx

nginx processes HTTP requests by going through multiple phases. For each phase, nginx will call 0 or more handlers. The phases are as follow:

  1. NGX_HTTP_SERVER_REWRITE_PHASE - phase that transforms the request URI at the virtual server level.
  2. NGX_HTTP_FIND_CONFIG_PHASE - phase that finds the appropriate location level configuration.
  3. NGX_HTTP_REWRITE_PHASE - phase that transforms the request URI at the location level.
  4. NGX_HTTP_POST_REWRITE_PHASE - phase that processes the results of the request URI transformation.
  5. NGX_HTTP_PREACCESS_PHASE - phase that prepares data for the phase that will actually check whether access to the ressource should be granted.
  6. NGX_HTTP_ACCESS_PHASE - phase that checks access to the requested resource.
  7. NGX_HTTP_POST_ACCESS_PHASE - phase that processes the results of the checks performed in the previous phase.
  8. NGX_HTTP_CONTENT_PHASE - phase that actually generates the response.
  9. NGX_HTTP_LOG_PHASE - phase that performs logging.

Custom handlers may be registered for all phases except:

Handlers can return any of the following constants:

This section seems slightly out of date (in particular the extra NGX_TRY_FILES_PHASE that appeared after the writing of the original document), as Valery's post on HTTP request processing phases and agentzh comment on NGX_AGAIN/NGX_DONE interpretation indicate.

To register a handler, it must be added to the array of handlers of the appropriate phase in the ngx_http_core_module main configuration. An example is shown below for adding a handler to the NGX_HTTP_CONTENT_PHASE phase:

static ngx_int_t ngx_http_sample_module_init(ngx_conf_t *cf)
{
    ngx_http_handler_pt        *h;
    ngx_http_core_main_conf_t  *cmcf;

    cmcf = ngx_http_conf_get_module_main_conf(cf, ngx_http_core_module);

    h = ngx_array_push(&cmcf->phases[NGX_HTTP_CONTENT_PHASE].handlers);
    if (h == NULL) {
        return NGX_ERROR;
    }

    *h = ngx_http_sample_module_handler;

    return NGX_OK;
}

Phase handlers are called, regardless of configuration. Therefore, a handler must be able to determine whether it should run or not. If it determines it should not run, it should return NGX_DECLINED and return it as fast as possible in order to avoid unnecessary overhead.

The NGX_HTTP_ACCESS_PHASE phase handlers determine whether a resource should be granted access or not. In this phase, the sequence of calls to the handlers is determined by the satisfy directive. The values returned by the handlers of this phase take additional meaning:

If the directive is satisfy all then all handlers must return NGX_OK for nginx to proceed to the next phase.

If the directive is satisfy any than at least one handler must return NGX_OK for nginx to proceed to the next phase.

The NGX_HTTP_CONTENT_PHASE phase is used to generate a response. If the configuration for location level of the ngx_http_core_module specify a handler then all requests are sent to this handler. Otherwise, nginx use the handlers defined in the main configuration for the NGX_HTTP_CONTENT_PHASE phase.

The handler set at location level for the ngx_http_core_module will not be called again, even if it returns NGX_DONE. Example for setting a handler for the location level configuration of ngx_http_core_module:

static char *
ngx_http_sample_module_command(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)
{
    ngx_http_core_loc_conf_t  *clcf;

    clcf = ngx_http_conf_get_module_loc_conf(cf, ngx_http_core_module);
    clcf->handler = ngx_http_sample_handler;

    return NGX_CONF_OK;
}

3.2. Integrating a module in nginx

To make a module known to nginx, you must provide the meta-information that describes how to initialize and configure it. This meta-information is provided by filling the structure ngx_module_t:

struct ngx_module_s {

    [...]

    ngx_uint_t            version;

    void                 *ctx;
    ngx_command_t        *commands;
    ngx_uint_t            type;

    [...]

    ngx_int_t           (*init_module)(ngx_cycle_t *cycle);
    ngx_int_t           (*init_process)(ngx_cycle_t *cycle);

    [...]

    void                (*exit_process)(ngx_cycle_t *cycle);
    void                (*exit_master)(ngx_cycle_t *cycle);

    [...]
};

typedef struct ngx_module_s      ngx_module_t;

Where the structure fields mean:

version - contains a version of the module structure (currently 1)
ctx - a pointer to the global context of the module,
commands - a C array containing all the discriptors of the directives supported by this module,
type - type of module: NGX_HTTP_MODULE, NGX_EVENT_MODULE, NGX_MAIL_MODULE and others,
init_module - handler called when module gets initialized in the main process (ie the master that will spawn the workers),
init_process - handler called for initialization of a module in a new worker process,
exit_process - handler called when a worker process terminates
exit_master - handler called when the master process terminates

Example:

#include <ngx_config.h>
#include <ngx_core.h>

[...]

ngx_module_t  ngx_http_some_module = {
    NGX_MODULE_V1,
    &ngx_http_some_module_ctx,             /* module context */
    ngx_http_some_commands,                /* module directives */
    NGX_HTTP_MODULE,                       /* module type */
    NULL,                                  /* init master */
    NULL,                                  /* init module */
    NULL,                                  /* init process */
    NULL,                                  /* init thread */
    NULL,                                  /* exit thread */
    NULL,                                  /* exit process */
    NULL,                                  /* exit master */
    NGX_MODULE_V1_PADDING
};

Note: The instance of the type ngx_module_t must be declared with qualifier extern. However, since all definitions must have the qualifier extern, it is omitted in the example.

  1. The presence of the extern qualifier seems to be necessary in the file ngx_modules.c. However, this file is generated by the configure command based on its options. Therefore, the presence of this qualifier seems to be now handled by the build process (more specifically by the auto/modules script).
  2. The init_master, init_thread and exit_thread handler do not seem to be called at all.

3.2.1 HTTP modules

For modules of type NGX_HTTP_MODULE, the field ctx of the structure ngx_module_t contains a pointer to a ngx_http_module_t structure:

typedef struct {
    ngx_int_t   (*preconfiguration)(ngx_conf_t *cf);
    ngx_int_t   (*postconfiguration)(ngx_conf_t *cf);

    void       *(*create_main_conf)(ngx_conf_t *cf);
    char       *(*init_main_conf)(ngx_conf_t *cf, void *conf);

    void       *(*create_srv_conf)(ngx_conf_t *cf);
    char       *(*merge_srv_conf)(ngx_conf_t *cf, void *prev, void *conf);

    void       *(*create_loc_conf)(ngx_conf_t *cf);
    char       *(*merge_loc_conf)(ngx_conf_t *cf, void *prev, void *conf);
} ngx_http_module_t;

Where the fields mean:

preconfiguration - handler called before processing the configuration file
postconfiguration - handler called after processing the configuration file
create_main_conf - handler called to create the main configuration,
init_main_conf - handler called to initialize the main configuration,
create_srv_conf - handler called to create the virtual server configuration,
merge_srv_conf - handler called to merge the virtual server configuration with the main configuration,
create_loc_conf - handler called to create the location configuration
merge_loc_conf - handler called to merge the location configuration with the configurations from higher levels.

Any of the fields may contain the value NULL, which means that no specific handler should be called for this step on this module.

ngx_http_module_t  ngx_http_some_module_ctx = {
    ngx_http_some_module_add_variables,    /* preconfiguration */
    NULL,                                  /* postconfiguration */

    NULL,                                  /* create main configuration */
    NULL,                                  /* init main configuration */

    NULL,                                  /* create server configuration */
    NULL,                                  /* merge server configuration */

    ngx_http_some_module_create_loc_conf,  /* create location configuration */
    NULL                                   /* merge location configuration */
};

3.2.2. Configuration directives of a module: description and processing

Configuration directives are described by the ngx_command_t structure:

struct ngx_command_s {
    ngx_str_t             name;
    ngx_uint_t            type;
    char               *(*set)(ngx_conf_t *cf, ngx_command_t *cmd, void *conf);
    ngx_uint_t            conf;
    ngx_uint_t            offset;
    void                 *post;
}

#define ngx_null_command  { ngx_null_string, 0, NULL, 0, 0, NULL }

typedef struct ngx_command_s     ngx_command_t;

Where the fields mean:

name - name of the directive as it will appear in the configuration file
type - type of the directive. It indicates the number and type of arguments that this directive accepts. Basic values are shown in the table below. These values can be combined with the bitwise or operator (|):

Type Description
NGX_CONF_NOARGS The directive takes no arguments
NGX_CONF_TAKE1 ... NGX_CONF_TAKE7 The directive takes the specified number of argument (between 1 and 7)
NGX_CONF_TAKE12 The directive takes 1 or 2 arguments. This is a shortcut to NGX_CONF_TAKE1 | NGX_CONF_TAKE2.
NGX_CONF_TAKE13 The directive takes 1 or 3 arguments. Shortcut similar to NGX_CONF_TAKE12.
NGX_CONF_TAKE123 The directive takes 1, 2 or 3 arguments. Shortcut similar to NGX_CONF_TAKE12.
NGX_CONF_TAKE1234 The directive takes 1, 2, 3 or 4 arguments. Shortcut similar to NGX_CONF_TAKE12.
NGX_CONF_BLOCK The directive takes an additional block element as argument.
NGX_CONF_FLAG The directive is a flag (i.e. it can be on or off)
NGX_CONF_ANY The directive takes 0 or more arguments
NGX_CONF_1MORE The directive takes 1 or more arguments
NGX_CONF_2MORE The directive takes 2 or more arguments
NGX_DIRECT_CONF The directive may be present in the main configuration file
NGX_MAIN_CONF The directive may be present at the root level configuration
NGX_ANY_CONF The directive can appear at any level of configuration
NGX_HTTP_MAIN_CONF The directive can appear at the main level of the configuration of the HTTP server.
NGX_HTTP_SRV_CONF The directive can appear at the (virtual) server level of the configuration.
NGX_HTTP_LOC_CONF The directive can appear at the location level of the configuration.
NGX_HTTP_LMT_CONF The directive may be present in the block limit_except of the configuration.
NGX_HTTP_LIF_CONF The directive may be present in the if block at location level.

set - the handler called when the directive is parsed in the configuration. To ease things, nginx provides many standard handlers:

Name of the handler Data type Type of the field in the module configuration that is indicated by the offset field (see below).
ngx_conf_set_flag_slot Flag ngx_flag_t
ngx_conf_set_str_slot String ngx_str_t
ngx_conf_set_str_array_slot Array of strings A pointer to a ngx_array_t containing ngx_str_t elements
ngx_conf_set_keyval_slot Array of key-value pairs A pointer to a ngx_array_t containing ngx_keyval_t elements
ngx_conf_set_num_slot Signed integer ngx_int_t
ngx_conf_set_size_slot Length size_t
ngx_conf_set_off_slot Offset off_t
ngx_conf_set_msec_slot Milliseconds ngx_msec_t
ngx_conf_set_sec_slot Seconds time_t
ngx_conf_set_bufs_slot The number and size of buffers ngx_bufs_t
ngx_conf_set_enum_slot Enumerated values (defined by postwhich is then considered as a C-array of ngx_str_t) ngx_uint_t
ngx_conf_set_bitmask_slot Binary mask (the values corresponding to setting/unsetting the bits are defined by the post field which is considered as a C-array of ngx_str_t) ngx_uint_t
ngx_conf_set_path_slot The path to the file system and the number of characters in the subdirectories ngx_path_t
ngx_conf_set_access_slot Access rights ngx_uint_t

conf - level configuration of the module referenced by the directive, or 0 if the directive has a specific processor,
offset - offset of the field to set in the module configuration.
post - extra information needed by specific set handlers for post-processing.

The list of directives for a module is implemented as a C-array which terminating element is ngx_null_command. Example:

typedef struct {
    ngx_str_t   foobar;
}

static ngx_command_t  ngx_http_some_module_commands[] = {

    { ngx_string("foobar"),
      NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_TAKE1,
      ngx_conf_set_str_slot,
      NGX_HTTP_LOC_CONF_OFFSET,
      offsetof(ngx_http_some_module_loc_conf_t, foobar),
      NULL },

    ngx_null_command
};

The example above describes the directive foobar which takes one argument. The directive can appear at the main, (virtual) server and location levels of the HTTP server. The argument of the directive is converted to a string that is stored in the foobar field of the module configuration structure.

Creation of the configurations

Before processing the configuration file, the structure containing a configuration should be allocated and initialized. This is the role of the handlers create_main_conf, create_srv_conf and create_loc_conf.

Merging configurations

To simplify the configuration process, each level of configuration creates the template of the subsequent configuration levels. Consider the example:

http {
    server {

        gzip_buffers 10 4k;

        location /foobar {
            # gzip_buffers 10 4k (inherited from upper level)
            gzip on;
        }

        location /foobaz {
            # gzip_buffers 10 4k (inherited from upper level)
        }
    }
}

When processing the server block, nginx will create a template for the configuration of the ngx_http_gzip_filter module and will apply the gzip_buffers directive to this module configuration. When processing location /foobar {}, nginx will create another instance of the configuration for the ngx_http_gzip_filter module and will apply the gzip directive to it. After processing the http block, the template for this module's configuration must be merged with the configurations resulting of the processing of the /foobar and /foobaz modules. When merging, all unset configurations values are set with values from the appropriate configuration template or with default values. To make implementation of merging handlers easier, the following macros are provided:

Name of the macro Data type Type of the field in the module configuration
ngx_conf_merge_ptr_value Pointer Any pointer (void * for example)
ngx_conf_merge_uint_value Integer ngx_uint_t
ngx_conf_merge_msec_value Milliseconds ngx_msec_t
ngx_conf_merge_sec_value Seconds time_t
ngx_conf_merge_size_value Length size_t
ngx_conf_merge_bufs_value The number and size of buffers ngx_bufs_t
ngx_conf_merge_bitmask_value Binary mask ngx_uint_t
ngx_conf_merge_path_value The path to the file system and the number of characters in the subdirectories ngx_path_t

Merging of the (virtual) server configuration is performed by the merge_srv_conf handler. Merging of the location level configuration is performed by the merge_loc_conf handler. Example:

typedef struct {
    ngx_str_t str_param;
    ngx_uint_t int_param;
} ngx_http_sample_module_loc_conf_t;

static char *
ngx_http_sample_merge_loc_conf(ngx_conf_t *cf, void *parent, void *child)
{
    ngx_http_sample_module_loc_conf_t  *prev = parent;
    ngx_http_sample_module_loc_conf_t  *conf = child;

    ngx_conf_merge_str_value(conf->str_param, prev->str_param, "default value");

    ngx_conf_merge_uint_value(conf->int_param,
                              prev->int_param, 1);

    return NGX_CONF_OK;
}

ngx_http_module_t  ngx_http_some_module_ctx = {
    NULL,                                  /* preconfiguration */
    NULL,                                  /* postconfiguration */

    NULL,                                  /* create main configuration */
    NULL,                                  /* init main configuration */

    NULL,                                  /* create server configuration */
    NULL,                                  /* merge server configuration */

    ngx_http_some_module_create_loc_conf,  /* create location configuration */
    ngx_http_some_module_merge_loc_conf    /* merge location configuration */
};

3.2.3. Description and calculation of module variables.

Module variables must be created before parsing of the configuration blocks in which these variables maybe encountered. To create such a variable, use the preconfiguration handler in the ngx_http_module_t structure. Example:

static ngx_int_t
ngx_http_some_module_add_variables(ngx_conf_t *cf);

ngx_http_module_t  ngx_http_some_module_ctx = {
    ngx_http_some_module_add_variables,    /* preconfiguration */

    [...]
};

static ngx_http_variable_t ngx_http_some_module_variables[] = {

    { ngx_string("var"), NULL, ngx_http_some_module_variable,
      0,
      NGX_HTTP_VAR_NOCACHEABLE, 0 },

    { ngx_null_string, NULL, NULL, 0, 0, 0 }
};

static ngx_int_t
ngx_http_some_module_add_variables(ngx_conf_t *cf)
{
    ngx_http_variable_t  *var, *v;

    for (v = ngx_http_some_module_variables; v->name.len; v++) {
        var = ngx_http_add_variable(cf, &v->name, v->flags);
        if (var == NULL) {
            return NGX_ERROR;
        }

        var->get_handler = v->get_handler;
        var->data = v->data;
    }

    return NGX_OK;
}

To generate a variable, you must implement a function that fills the ngx_http_variable_value_t structure using data from a query, from the context, from the configuration of the module or from other sources. Example:

static ngx_int_t
ngx_http_some_module_variable(ngx_http_request_t *r,
    ngx_http_variable_value_t *v,  uintptr_t data)
{
    v->valid = 1;
    v->no_cacheable = 1;
    v->not_found = 0;

    v->data = (u_char*)"42";
    v->len = 2;

    return NGX_OK;
}

3.3. Compilation and assembly of modules in nginx

To build nginx with the module you must provide the path to the directory where the module resides by invoking ./configure with the parameter --add-module= set to the aforementioned path. This directory must contain a file named config. The config file is a script that will be executed by the nginx build system and which role is to set a number of variables that control the assembly of the module into nginx. Here is a list of the most important variables:

Variable name Description
ngx_addon_name Name of the module.
NGX_ADDON_SRCS Lists all the source files of all the modules that must be assembled.
NGX_ADDON_DEPS Lists all depedent files of all modules (usually header files) that should be assembled.
HTTP_MODULES List of all HTTP modules.
HTTP_AUX_FILTER_MODULES List of all filter modules.
USE_MD5 Should MD5 support be enabled (YES/NO)
USE_SHA1 Should SHA1 support be enabled (YES/NO)
USE_ZLIB Should ZLIB support be enabled (YES/NO)

The directory where the config file is located can be referenced inside the config through the variable ngx_addon_dir. Example of a config file:

ngx_addon_name=ngx_http_sample_module
HTTP_MODULES="$HTTP_MODULES ngx_http_sample_module"
NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_sample_module.c"

Assuming the module files are located in the directory /home/valery/work/sample_module, to enable this module, configure nginx like that:

path/to/nginx$ ./configure --add-module=/home/valery/work/sample_module

Then...

path/to/nginx$ make
path/to/nginx$ make install

After that, the installed version of nginx will incldue support for the directives and variables of the newly added module.

4. Modules

This chapter is still unfinished. It should contain important and interesting material but, for now the author has no good idea on what to include.

Contact the author

Kholodkov Valery valery+nginx@grid.net.ru
Please use address above to send me email.

Links

Nginx: www.sysoev.ru/nginx/ - a web server developed by Igor Sysoev

Acknowledgments


Copyright (C) 2008 Valery Kholodkov