Module cache

Objective is to consolidate available module information in a few number of files in order to process these files when doing global module queries like done by avail or whatis sub-commands instead of walking through the whole modulepath directories to compute this information.

With cache files, it is expected to have lesser IO load and more efficiently handle environments with thousands modulefiles stored in a parallel filesystem.

How to store cache

Module cache will be a file aggregating available module information. Module are organized into modulepaths that are enabled in user environment. Goal is to have one module cache file per modulepath. This file will be stored under the modulepath root with the .modulecache file name.

Depending on modulepath ownership, modulepath may or may not have a cache file built. It seems more efficient to save cache on a per-modulepath basis as a user may enable diverse modulepaths, some of them not maintained by the admins of the machine they are running on.

Having a central system cache may miss some user-managed modulepaths.

Having a per-user cache file does not seem a good idea. If built for the user, a lot more information may be pre-processed: dynamic information from the modulerc files for instance may be processed at cache file creation, not when using the cache file. But it cannot be expected from the user to build its cache and keep it up to date. Having the cache on the user side can lead to additional issue with users complaining to the support that they cannot find a given module, but due to an outdated cache file.

In definitive, there will be one cache file per modulepath directory.

Cache content

As cache files should be used instead of walking through modulepath directory to search for valid modulercs and modulefiles. Cache file should contain everything to be autonomous to proceed avail or whatis queries.

Yet module search can be very dynamic as things set for instance in modulerc files may dynamically produce new modulefiles or define properties specific to some users or groups.

So the produced cache file cannot be the static view generated for the user who has built the cache file. Thus dynamic shaping is preserved by storing scripts and commands in the cache file.

Modulerc files may be highly dynamic so their full content is purely saved into the cache file. An entry is set for each valid modulerc file. Same goes for modulefiles as some conditional structure in their code can dynamically change dependency, whatis or other definitions.

It is expected that modulefile will be evaluated on avail or whatis commands for complex queries. So for such use case it is important to have modulefile content cached.

Thus by reading the cache file, in one IO the content of all the modulerc and modulefiles of the modulepath will be fetched.

Goal is to record in cache file every information obtained as a result of findModules procedure. Thus cache file contains modulefile modification times, which is reported on --long output. Modulefile validity check result is also recorded if modulefile is not valid. Modulerc validity check is not recorded as it is not an information produced by findModules.

Note: modulercs and modulefiles are not tested when generating cache file to see if they are valid, if their code is correctly executed. Their bare content is recorded. Same result is obtained this way whether a cache file is used or not.

Cache format

Goal is to make the cache file a Tcl script, like modulefiles. It will not be exactly a modulefile as specific commands will be setup to evaluate cache file and cache file will not make use of modulefile commands. See Cache evaluation section for details on how a cache file is evaluated.

Cache file starts with a #%Module string like modulefiles. A version number is appended next to these first characters (for instance %Module5.3). See Cache validity section for details on the cache file prefix string.

Modulefile content in modulepath is defined in cache with the modulefile-content command. Which takes in this order:

  • relative file path

  • file last modification date (as a Unix epoch time)

  • file module header

  • file body content.

For instance:

  • modulefile-content foo/1.0 1234567890 #%Module {body}

  • or modulefile-content {foo/w s} 1234567890 #%Module5.2 {body}

Similar command for modulerc: modulerc-content which accepts following syntax:

  • relative file path

  • file module header

  • file body content.

For instance: modulerc-content foo/.modulerc #%Module {body}. Modification time is not needed for modulerc as this information is not reported on an avail sub-command in long mode.

Every modulercs and modulefiles file contained in modulepath are recorded in cache with modulerc-content or modulefile-content command. Modulefiles whose name start with a dot character are also recorded in cache.

Invalid modulefiles are recorded in cache file through modulefile-invalid command rather modulefile-content. It accepts following arguments:

  • relative file path

  • invalidity kind

  • error message

For instance: modulefile-invalid foo/2.0 invalid {Magic cookie '#%Module' missing}

Files or directories that have limited access are recorded with specific commands:

  • limited-access-file foo/1.0

  • limited-access-directory foo

Limited access means for a file that it cannot be read by user that builds cache or other users. For a directory it means that it cannot be either read or searched by user that builds cache or other users.

A modulefile or a modulerc is not recorded with modulefile-content or modulerc-content if file or one of its parent directory has limited access. This way only content that can be read by everyone is recorded into cache file. Sensitive information are excluded from cache file.

Note: a cache file generated by a privileged user (which has access to every thing) and a cache file generated by a less privileged user will be the same as limited-access information will not be included in cache file.

Files or directories recorded as limited access will need to be tested (and walked down for directories) when cache file will be evaluated. It is important to distinguish files from directories to save some file stat test to determine if an element is a directory when limited access elements will be tested. No need for files to distinguish modulefile from modulerc as this difference is visible with file name.

Note: Limited access tests are skipped on Windows platform as Unix-style file permission cannot be tested there.

Recording full modulefile content or subset of elements

It was initially drafted that only a subset of element of modulefiles would be recorded in cache file, to reduce size of this file and reduce its evaluation time.

Recording full modulefile content is in the end preferred as:

  • this solution is simpler to implement

  • cache file size is not too big in the end (~ hundreds of KB for a thousand of modulefiles)

  • evaluation time of large cache file is acceptable (time taken to evaluate cannot be noticed by user)

  • simpler to understand and manage for sites

Producing a cache entry for a modulefile with only a subset of commands recorded (like variant or requirement) is only feasible for modulefiles not using conditionals or specific evaluation scheme. Recording full content will work in any scenario whereas recording a subset limits cache usage. Moreover it is hard to determine, depending of the modulefile set, where cache can be used or not if only a subset of elements is recorded.

Cache validity

Cache file header indicates a Modules version number. It corresponds to the Modules version:

  • the cache file has been built with

  • the cache file is compatible with

It seems reasonable to ask for a cache file update every time Modules is upgraded to a newer minor version (for instance from 5.3 to 5.4):

  • it is simpler to understand for staff and user when the cache is taken into account, when it is ignored

  • better to ensure cache file is accurate for the Modules version as modulerc and modulefile commands may evolve from one version to another

Cache usage

Any time a modulepath directory is opened to get its content, the module cache file will be used instead if available.

Modulepath content analysis is performed by findModules procedures. So any sub-command calling it (directly or through getModules or getPathToModule) will use the cache file. It corresponds to the following sub-commands:

  • lint

  • paths

  • search

  • whatis

  • aliases

  • avail

  • switch

  • restore

  • save

  • display

  • path

  • source

  • load

  • test

  • edit

  • help

It may also occur during other sub-commands that evaluates modulefiles using the is-avail command: like unload or refresh.

Cache files are ignored if ignore_cache configuration option is enabled. This option can also be enabled just for one execution with the --ignore-cache command line option.

Cache files are ignored if cache expiry mechanism is enabled through the cache_expiry_secs configuration option. When this option is set to 0, it means a cache file never expires. This is the default behavior. If set to something else, cache file is expired if its last modification time is older than the number of seconds defined in cache_expiry_secs. Option value is an integer between 1 and 31536000, which is the number of seconds during 1 year.

Is there an impact at evaluating the full cache file rather making a directory walk-through to find a module? Cache file is fully read, but not all the files described in it are evaluated. Just those corresponding to the search, like it is done when walking modulepath directory and evaluating only the modulerc files corresponding to the query. So results between using cache file or not should be the same: compared to a search without cache, no extra modulefile or modulerc evaluation will be performed when a cache file is used.

As cache is recorded with both mcookie_check and mcookie_version_check options enabled, these two options are not honored (if disabled) when a cache file is used. They are primarily useful to skip I/O tests when walking through the content of a modulepath directory. As these I/O tests are done during the cache build process, the options are useless when using cache files.

When cache file magic cookie defines a Modules version greater than the current one, the cache file is silently ignored. Raisin error is not useful as different version of Modules may be deployed in the same site environment.

When cache file is not in sync

Files or directories are freely available through cache when used even if after cache being built:

  • their access is limited

  • they are deleted

  • their content changes and is not anymore valid

When files or directories have their access limited prior building cache, but afterward these access limitations are lifted. These elements will require an access test to check if they are available. This test will always be successful as element accesses are not anymore limited.

If files or directories do not exist when cache is built, they will not be found when cache is used.

If modulefile is recorded in cache as invalid, it will stay invalid if cache is used even the modulefile is fixed. Cache need to be regenerated.

Read/write performances

cache_buffer_bytes configuration option defines size of the buffer when reading or writing cache files.

With a bigger buffer, fewer read or write system calls are needed to read or write cache file. On busy storage systems it can improve I/O performances.

Cache evaluation

A Tcl sub-interpreter is created to analyze cache files. This sub-interp is setup to evaluate cache file-specific commands, like modulefile-content.

When evaluated, modulefile-content, modulerc-content and modulefile-invalid commands populate the read cache structure of modulerc and modulefiles. This way when the modulefile for instance need to be read, its content is already found in memory cache structure. It corresponds to the following global variables:

  • ::g_modfileContent

  • ::g_fileMtime (only for valid modulefile)

  • ::g_modfileValid (only for modulefile, valid or not)

In addition a ::g_cacheModpath array is filled with an entry dedicated for each modulepath. The content of this entry mimics the result list returned by findModules procedures with information for the whole content of the modulepath.

Limited access files and directories described in cache by limited-access-file and limited-access-directory commands populate specific structures to indicate some entries in modulepath have to be tested (and walked down for directories) to determine if they are available to current user:

  • g_cacheFLimitedModpath

  • g_cacheDLimitedModpath

These two structures are arrays with one entry per cached modulepath. Limited access entries are tested if they match search query. Test is done through findModulesFromDirsAndFiles procedure which corresponds to the walk down code extracted from findModules.

This specific interpreter is re-used between different cache file evaluations. As for modulefile interpreter, a consistency check is performed before each reuse to test that the cache file-specific commands have not be rewritten during previous cache file evaluation.

Cache file evaluation is tracked to avoid evaluating twice the same cache file.

Cache evaluation stops if an erroneous command or syntax is encountered. Like for erroneous modulerc, error is not reported during avail or whatis commands unless if ran in debug mode. Error is reporting during a load evaluation. Cache evaluation is considered failed if there is an error in the cache file, thus a non-cache module search will occur instead of relying on cache module listing. However descriptions of modulefile and modulerc evaluated in cache prior the error occurs are retained.

cachebuild sub-command

cachebuild sub-command creates a module cache file in modulepaths. Without arguments, it attempts to create cache in every enabled modulepaths where running user has the right to write. If arguments are provided, cache is build in the directories pointed by these arguments.

General properties:

  • Shortcut name: none

  • Accepted option: none

  • Expected number of argument: 0 to N

  • Accept boolean variant specification: no

  • Parse module version specification: no

  • Fully read modulefile when checking validity: yes

  • Sub-command only called from top level: yes

  • Lead to modulefile evaluation: yes (cachebuild)

An error is returned for each specified directories where current user has no write access.

An error is returned if a modulefile or a modulerc cannot be read. This error ends cache content generation for current modulepath. Build continues with next modulepath after this error.

Modulepaths where current user has no write rights are skipped and reported with a warning notice.

Reports a Creating <modulepath> block header message for each cache file created or updated. This report is made when verbosity is set to normal or higher mode.

mcookie_check and mcookie_version_check options are both enabled when recording cache. This is produced with exact same content whether these options are enabled or not.

cacheclear sub-command

cacheclear sub-command deletes all module cache file in enabled modulepaths.

General properties:

  • Shortcut name: none

  • Accepted option: none

  • Expected number of argument: 0

  • Accept boolean variant specification: no

  • Parse module version specification: no

  • Fully read modulefile when checking validity: no

  • Sub-command only called from top level: yes

  • Lead to modulefile evaluation: no

Modulepaths where current user has no write rights on the modulepath directory are skipped and reported with a warning notice.

Reports a Deleting <modulepath> block header message for each cache file created or updated. This report is made when verbosity is set to normal or higher mode.