Just Nest It

The usefulness of nested functions in MATLAB has been the subject of heated debate (check out the comments in Loren Shure’s post Nested Functions and Variable Scope for instance) mainly due to the complexity of data flow.

This is not an attempt to completely disprove the aforementioned complexity arguments. The relative complexity is there but one might argue that it is inevitable due to data sharing. Probably an editor affordance (Tim Davis’s “definitions”: (1) the ability of one to pay the costs for attending a prom, (2) what you do if you turn the wheels too sharply in your Mustang while driving on ice; Joel Spolsky’s definition in the context of a GUI) could make the situation more tenable and the coding process less error prone. But that is another story. This is, however, an attempt to showcase the beauty and versatility that nested functions can bring to a piece of code, providing the author is mindful of when and how to take advantage of the data sharing features they provide, without compromising the maintainability.

Before embarking on this somewhat tortuous journey, a refresher is in order. What follows is a list of points to bear in mind when dealing with nested function, excerpted from either MATLAB documentation on Nested Functions or the discussions in the comments of the related posts by Loren Shure (both with modifications).

  • A nested function can be called from
    – the level immediately above it.
    – a function nested at the same level within the same parent function.
    – a function at any lower level.
  • Nested functions are not accessible to the str2func or feval function. You cannot call a nested function using a handle that has been constructed with str2func. And, you cannot call a nested function by evaluating the function name with feval. To call a nested function, you must either call it directly by name, or construct a function handle for it using the @ operator.
  • As a rule, a variable used or defined within a nested function resides in the workspace of the outermost function that both contains the nested function and accesses that variable. The scope of this variable is then the function to which this workspace belongs, and all functions nested to any level within that function.
    The special case of varargin and varargout variables is particularly interesting: if a nested function includes varargin or varargout in its function declaration line, then the use of varargin or varargout within that function returns optional arguments passed to or from that function. If varargin or varargout are not in the nested function declaration but are in the declaration of an outer function, then the use of varargin or varargout within the nested function returns optional arguments passed to the outer function.
  • Variables containing values returned by a nested function are not in the scope of outer functions.
  • Externally scoped variables that are used in nested functions for which a function handle exists are stored within the function handle. So, function handles not only contain information about accessing a function, for nested functions, they also store the values of any externally scoped variables required to execute the function.
    It is interesting to note that whos fh_nested, where fh_nested is a function handle to a nested function, will only show the size of the function handle itself not everything that it encapsulates. This can be verified by examining the structure struct_fh_nested = functions(fh_nested).
  • Variables cannot be “poofed” into the workspace of nested functions. [The way Loren Shure, so eloquently, put it.] The scoping rules for nested, and in some cases anonymous, functions require that all variables used within the function be present in the text of the M-file code. MATLAB issues an error if you attempt to dynamically add a variable to the workspace of an anonymous function, a nested function, or a function that contains a nested function. An important operation that causes variables to be dynamically added to the workspace is loading variables from a MAT file using load without an output, which can be avoided by using the form of load that returns a MATLAB structure.

The examples that follow illustrate what might be dubbed memory effects, or, as T. Driscoll calls them in Learning MATLAB, “lasting side effects.” (Side effects refer to the notion of functions in imperative programming, i.e., to the fact that the same language expression can result in different values depending on the state of the executing function. This implies referential opaqueness, or equivalently lack of referential transparency, which in turn implies impossibility of memoization.)

The first example features two functions that perform the same task of returning segments of an Excel sheet, advancing daily. An initial array is read during the first run of either function and is used as a cache for past data up to the current day. In a subsequent run, another array is read and used as a cache for future data, from which every time the current day data row is picked and appended to the past data cache, dropping the most obsolete (i.e., the first) row of the past data cache every time.
One of these functions is implemented using an ordinary, m-file, function (which can be made a subfunction of the invoking function too) and the other using a nested function. The differences between the two implementations are highlighted.

Implementation based on Ordinary functions/Subfunctions


function [curDate,E] = readInputFileDaily_sub(inpFileName,inpSheet,...
                                              colRange,endDate,varargin)

persistent isFirstRun rowSheet rowE prevEs newEs newDates

optArgs   = {2,252,500};
emptyArgs = cellfun(@isempty,varargin);
[optArgs{~emptyArgs}] = varargin{~emptyArgs};

[sheetStartRow,rowInitEst,nDays2Read] = optArgs{:};

nCols = colRange{2} - colRange{1} + 1;

if isempty(isFirstRun)
    isFirstRun = false;
    rowSheet   = rowInitEst;
    inpRange   = [colRange{1} num2str(sheetStartRow) ':' ...
				  colRange{2} num2str(rowInitEst)];
    
    [prevEs,initDate] = xlsread(inpFileName,inpSheet,inpRange);
    
    notNan 	  = all(~isnan(prevEs),2);
    prevEs    = prevEs(notNan,:);
    initDate  = initDate(notNan,1);
    
    curDate   = initDate(end);
    E         = prevEs;
    return
end

while true
    rowSheet = rowSheet + 1;
    
    if isempty(newEs) || (rowE == size(newEs,1))
        rowE     = 1;
        inpRange = [colRange{1} num2str(rowSheet) ':' ...
                    colRange{2} num2str(rowSheet + nDays2Read - 1)];
        
        [newEs,newDates] = xlsread(inpFileName,inpSheet,inpRange);
    else
        rowE = rowE + 1;
    end
    
    curDate  = newDates(rowE);
    
    if strcmp(curDate,endDate) || isempty(curDate)
        [curDate,E] = deal({},[]);
        return
    end
    
    curRowE = newEs(rowE,:);
    if all(~isnan(curRowE)) && (length(curRowE) == nCols - 1)
        E = [prevEs(2:end,:);
            curRowE];
        break
    end
end

prevEs = E;

Implementation based on Nested functions


function fh_readInputFileCore = readInputFileDaily_nested(...
                                inpFileName,inpSheet,colRange,endDate,varargin)

optArgs   = {2,252,500};
emptyArgs = cellfun(@isempty,varargin);
[optArgs{~emptyArgs}] = varargin{~emptyArgs};

[sheetStartRow,rowInitEst,nDays2Read] = optArgs{:};

nCols = colRange{2} - colRange{1} + 1;

fh_readInputFileCore = @readInputFileCore;

[rowSheet,rowE]        = deal(0);
[prevEs,newEs]         = deal([]);
newDates               = {};

    function [curDate,E] = readInputFileCore()

        if isempty(prevEs)
            rowSheet = rowInitEst;
            inpRange = [colRange{1} num2str(sheetStartRow) ':' ...
                        colRange{2} num2str(rowInitEst)];

            [prevEs,initDate] = xlsread(inpFileName,inpSheet,inpRange);

            notNan    = all(~isnan(prevEs),2);
            prevEs    = prevEs(notNan,:);
            initDate  = initDate(notNan,1);

            curDate = initDate(end);
            E       = prevEs;
            return
        end

        while true
            rowSheet = rowSheet + 1;

            if isempty(newEs) || (rowE == size(newEs,1))
                rowE     = 1;
                inpRange = [colRange{1} num2str(rowSheet) ':' ...
                            colRange{2} num2str(rowSheet + nDays2Read - 1)];

                [newEs,newDates] = xlsread(inpFileName,inpSheet,inpRange);
            else
                rowE = rowE + 1;
            end

            curDate = newDates(rowE);

            if strcmp(curDate,endDate) || isempty(curDate)
                [curDate,E] = deal({},[]);
                return
            end

            curRowE = newEs(rowE,:);
            if all(~isnan(curRowE)) && (length(curRowE) == nCols - 1)
                E = [prevEs(2:end,:);
                    curRowE];
                break
            end
        end

        prevEs = E;
    end
end

Here is a snapshot showing the differences of readInputFileDaily_sub and readInputFileDaily_nested side by side.

Differences of readInputFileDaily_sub and readInputFileDaily_nested

Note that except for isFirstRun, which is not used in readInputFileDaily_nested, the rest of persistent variables in readInputFileDaily_sub are externally scoped variables with respect to readInputFileCore and will be saved in the function handle returned by readInputFileDaily_nested.

Invoking readInputFileDaily_sub


function maxlikeTransformedData_sub

nTradeDays    = 252;

inpFilePath   = ['.' filesep];
inpFileName   = 'LEHMQ_AIG';
inpFileName   = fullfile(inpFilePath,inpFileName);

colRange      = {'A','D'};
sheetStartRow = 2;
initEstRow    = sheetStartRow + nTradeDays;
endDate       = '9/15/2008';

firms         = {'AIG','LEHMQ'};

inpSheets     = cellfun(@(firmName) ['Input' firmName],firms,...
                                     'UniformOutput',false);

nFirms    = length(firms);
idxFirms  = 1:nFirms;
for iFirm = idxFirms
    clear readInputFileDaily_sub

    while true
        [curDate,E] = readInputFileDaily_sub(inpFileName,inpSheets{iFirm}, ...
                                             colRange,endDate,[],initEstRow);
        if isempty(curDate)
            break
        end
        disp(firms{iFirm}), disp(curDate), disp(E)
    end
end

Invoking readInputFileDaily_nested


function maxlikeTransformedData_nested

nTradeDays    = 252;

inpFilePath   = ['.' filesep];
inpFileName   = 'LEHMQ_AIG';
inpFileName   = fullfile(inpFilePath,inpFileName);

colRange      = {'A','D'};
sheetStartRow = 2;
initEstRow    = sheetStartRow + nTradeDays;
endDate       = '9/15/2008';

firms         = {'AIG','LEHMQ'};

inpSheets     = cellfun(@(firmName) ['Input' firmName],firms,...
                                     'UniformOutput',false);

nFirms    = length(firms);
idxFirms  = 1:nFirms;
for iFirm = idxFirms
    readNextSegment.(firms{iFirm}) = ...
        readInputFileDaily_nested(inpFileName,inpSheets{iFirm},...
                                  colRange,endDate,[],initEstRow);
end

while true
    for iFirm = idxFirms
        [curDate,E] = readNextSegment.(firms{iFirm})();
        if isempty(curDate)
            idxFirms(idxFirms == iFirm) = [];
            continue
        end
        disp(firms{iFirm}), disp(curDate), disp(E)
    end
    if isempty(idxFirms), break, end
end

Here is a snapshot showing the differences of maxlikeTransformedData_sub and maxlikeTransformedData_nested side by side.

Differences of maxlikeTransformedData_sub and maxlikeTransformedData_nested

The beauty of the code that uses nested functions should be apparent by now. For each firm (which corresponds to a sheet in the Excel file) in the cell array firms the function readInputFileDaily_nested is invoked once to create a unique function handle to readInputFileCore which, using dynamic field names, is stored in the structure readNextSegment.(firms{iFirm}). (This use of nested functions together with function handles can be thought of as creating light weight objects in an OOP context.) The actual reading of data for each firm is done by invoking the function readInputFileCore through the function handle readNextSegment.(firms{iFirm}). As the reading state (i.e., the externally scoped variables rowSheet, rowE, prevEs, newEs, newDates) for each firm (i.e., sheet) is stored in its corresponding function handle, it is possible to read a segment of all sheets simultaneously, perform the computations that depend on all these data segments being present (such as calculating the joint probability of an event for the firms, after estimating the parameters required for each individual firm), and proceed to another segment.
Clearly the implementation that uses ordinary functions (or subfunctions, for that matter) will not allow this simultaneous processing, as at each point in time there can only be one set of persistent variables for the function maxlikeTransformedData_sub and processing different sheets simultaneously requires independent sets of these variables at the same time. This requirement is also the reason the function maxlikeTransformedData_sub has to be cleared from memory before each sheet is processed—this makes sure we are not using the values left in the persistent variables from previous invocations of this function.

As another beautiful use of the memory effects provided by nested functions, let’s look at an example excerpted, with modifications, from “Learning MATLAB.”
Suppose our “objective” is to find the smallest x such that the maximum real part of the eigenvalues of a certain matrix A(x) equals 1. This is a straightforward task using fzero but now suppose that not only the value of x is sought but also we would like to know the eigenvalues of the matrix A(x) at the “optimal” x, without repeating the eigenvalue computation—this is a perfectly reasonable “constraint” as we have already done that computation and should not need to redo it.
Coding the objective as a nested function makes short work of fulfilling this task.


function x0 = findx
B  = diag(ones(49,1),1); 
A  = B - B';
x0 = fzero(@objective,[0 10]);
plot(e,'*')

    function r = objective(x)
        A(1,1) = x;
        e = eig(A); r = max(real(e)) - 1;
    end
end

Again it is noted that the appearance of e (the highlighted line) in the definition of the parent function findx is critical for having its value shared between the parent and nested functions.

See also

John D’Errico’s loopchoose which is a “looped version of nchoosek. nchoosek can generate all combinations of a set of numbers. But sometimes that set can grow too large to store.” The solution is to generate each member of the set of all combinations in turn. loopchoose does exactly that.

Attachments

LEHMQ_AIG.xls
(The extension of this file has been changed to doc to allow uploading to WordPress.com. To be used without changes to the code in this post, this file should be renamed to LEHMQ_AIG.xls after downloading.)

Advertisements
This entry was posted in Fundamentals and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s