———C———
gcc -c -save-temps hello.c
After running this command, the preprocessed output will be available in the file `hello.i’. The -save-temps option also saves `.s’ assembly files and `.o’ object files in addition to preprocessed `.i’ files.
———
The complete set of predefined macros can be listed by running the GNU preprocessor cpp with the option -dM on an empty file:
cpp -dM /dev/null
Note that this list includes a small number of system-specific macros defined by gcc which do not use the double-underscore prefix. These non-standard macros can be disabled with the -ansi option of gcc.
———
The following options are a good choice for finding problems in C and C++ programs:
gcc -ansi -pedantic -Wall -W -Wconversion -Wshadow -Wcast-qual -Wwrite-strings
———
Occasionally a valid ANSI/ISO program may be incompatible with the extensions in GNU C. To deal with this situation, the compiler option -ansi disables those GNU extensions which are in conflict with the ANSI/ISO standard. On systems using the GNU C Library (glibc) it also disables extensions to the C standard library. This allows programs written for ANSI/ISO C to be compiled without any unwanted effects from GNU extensions. The non-standard keywords and macros defined by the GNU C extensions are asm, inline, typeof, unix and vax.
It is also possible to compile the program using ANSI/ISO C, by enabling only the extensions in the GNU C Library itself. This can be achieved by defining special macros, such as _GNU_SOURCE, which enable extensions in the GNU C Library.
The GNU C Library provides a number of these macros (referred to as feature test macros) which allow control over the support for POSIX extensions (_POSIX_C_SOURCE), BSD extensions (_BSD_SOURCE), SVID extensions (_SVID_SOURCE), XOPEN extensions (_XOPEN_SOURCE) and GNU extensions (_GNU_SOURCE).
The _GNU_SOURCE macro enables all the extensions together, with the POSIX extensions taking precedence over the others in cases where they conflict.
———
When environment variables and command-line options are used together the compiler searches the directories in the following order:
1. command-line options -I and -L, from left to right
2. directories specified by environment variables, such as C_INCLUDE_PATH (for C programs), CPLUS_INCLUDE_PATH (for C++ programs) and LIBRARY_PATH
3. default system directories
———
The traditional behavior of linkers is to search for external functions from left to right in the libraries specified on the command line. This means that a library containing the definition of a function should appear after any source files or object files which use it. This includes libraries specified with the short-cut -l option. When several libraries are being used, the same convention should be followed for the libraries themselves. A library which calls an external function defined in another library should appear before the library containing the function. Most current linkers will search all libraries, regardless of order, but since some do not do this it is best to follow the convention of ordering libraries from left to right.
This is worth keeping in mind if you ever encounter unexpected problems with undefined references, and all the necessary libraries appear to be present on the command line
When a program has been compiled using shared libraries it needs to load those libraries dynamically at run-time in order to call external functions. The command ldd examines an executable and displays a list of the shared libraries that it needs. These libraries are referred to as the shared library dependencies of the executable.
[Brian Gough, An Introduction to GCC, http://www.network-theory.co.uk/docs/gccintro/]
———
C doesn’t really “understand” array indexing, except in declarations. As far as the compiler is concerned, an expression like x[n] is translated into *(x+n) and use made of the fact that an array name is converted into a pointer to the array’s first element whenever the name occurs in an expression. That’s why, amongst other things, array elements count from zero: if x is an array name, then in an expression, x is equivalent to &x[0], i.e. a pointer to the first element of the array. So, since *(&x[0]) uses the pointer to get to x[0], *(&x[0] + 5) is the same as *(x + 5) which is the same as x[5]. A curiosity springs out of all this. If x[5] is translated into *(x + 5), and the expression x + 5 gives the same result as 5 + x (it does), then 5[x] should give the identical result to x[5]! If you don’t believe that, here is a program that compiles and runs successfully:
#include <stdio.h>
#include <stdlib.h>
#define ARSZ 20
main(){
int ar[ARSZ], i;
for(i = 0; i < ARSZ; i++){
ar[i] = i;
i[ar]++;
printf("ar[%d] now = %d\n", i, ar[i]);
}
printf("15[ar] = %d\n", 15[ar]);
exit(EXIT_SUCCESS);
}
———
int ar[20], *ip;
for(ip = &ar[0]; ip < &ar[20]; ip++)
*ip = 0;
That example is a classic fragment of C. A pointer is set to point to the start of an array, then, while it still points inside the array, array elements are accessed one by one, the pointer incrementing between each one. The Standard endorses existing practice by guaranteeing that it’s permissible to use the address of ar[20] even though no such element exists. This allows you to use it for checks in loops like the one above. The guarantee only extends to one element beyond the end of an array and no further.
Why is the example better than indexing? Well, most arrays are accessed sequentially. Very few programming examples actually make use of the `random access’ feature of arrays. If you do just want sequential access, using a pointer can give a worthwhile improvement in speed. In terms of the underlying address arithmetic, on most architectures it takes one multiplication and one addition to access a one-dimensional array through a subscript. Pointers require no arithmetic at all—they nearly always hold the store address of the object that they refer to. In the example above, the only arithmetic that has to be done is in the for loop, where one comparison and one addition are done each time round the loop. The equivalent, using indexes, would be this:
int ar[20], i;
for(i = 0; i < 20; i++)
ar[i] = 0;
———
char c; char *const cp = &c;
cp is a pointer to a char, which is exactly what it would be if the const weren’t there. The const means that cp is not to be modified, although whatever it points to can be—the pointer is constant, not the thing that it points to. The other way round is
const char *cp;
which means that now cp is an ordinary, modifiable pointer, but the thing that it points to must not be modified.
———
The comma operator (,) is used to separate two or more expressions that are included where only one expression is expected. When the set of expressions has to be evaluated for a value, only the rightmost expression is considered.
/* comma used - this loop has two counters */
for(i=0, j=0; i <= 10; i++, j = i*i){
printf("i %d j %d\n", i, j);
}
/*
* In this futile example, all but the last
* constant value is discarded.
* Note use of parentheses to force a comma
* expression in a function call.
*/
printf("Overall: %d\n", ("abc", 1.2e6, 4*3+2));
———
The other way to initialize variables, known as constructor initialization, is done by enclosing the initial value between parentheses:
type identifier (initial_value);
For example:
int a(0);
Both ways of initializing variables are valid and equivalent in C++.
———
for(i=0; i <= 10; i++){
printf((i&1) ? "odd\n" : "even\n");
}
———
It is because of unions that structures cannot be compared for equality. The possibility that a structure might contain a union makes it hard to compare such structures; the compiler can’t tell what the union currently contains and so wouldn’t know how to compare the structures. This sounds a bit hard to swallow and isn’t 100% true—most structures don’t contain unions—but there is also a philosophical issue at stake about just what is meant by `equality’ when applied to structures. Anyhow, the union business gives the Standard a good excuse to avoid the issue by not supporting structure comparison.
———
enum e_tag{
a, b, c, d=20, e, f, g=20, h
}var;
Just as with structures and unions, the e_tag is the tag, and var is the definition of a variable.
The names declared inside the enumeration are constants with int type. Their values are these:
a == 0 b == 1 c == 2 d == 20 e == 21 f == 22 g == 20 h == 21
———
Arrays, structures and unions are `derived from’ (contain) other types; none of them may be derived from incomplete types. This means that a structure or union cannot contain an example of itself, because its own type is incomplete until the declaration is complete. Since a pointer to an incomplete type is not itself an incomplete type, it can be used in the derivation of arrays, structures and unions.
———
As a word of warning: typedef can only be used to declare the type of return value from a function, not the overall type of the function. The overall type includes information about the function’s parameters as well as the type of its return value.
/*
* Using typedef, declare 'func' to have type
* 'function taking two int arguments, returning int'
*/
typedef int func(int, int);
/* ERROR */
func func_name{ /*....*/ }
/* Correct. Returns pointer to a type 'func' */
func *func_name(){ /*....*/ }
/*
* Correct if functions could return functions,
* but C can't.
*/
func func_name(){ /*....*/ }
———
const int ci = 123; /* declare a pointer to a const.. */ const int *cpi; /* ordinary pointer to a non-const */ int *ncpi; /* * this needs a cast * because it is usually a big mistake, * see what it permits below. */ ncpi = (int *)cpi; /* * now to get undefined behavior... * modify a const through a pointer */ *ncpi = 0;
———
The header`signal.h’ declares a type called sig_atomic_t which is guaranteed to be modifiable safely in the presence of asynchronous events. This means only that it can be modified by assigning a value to it; incrementing or decrementing it, or anything else which produces a new value depending on its previous value, is not safe.
———
There is special treatment for places in the macro replacement text where one of the macro formal parameters is found preceded by #. The token list for the actual argument has any leading or trailing white space discarded, then the # and the token list are turned into a single string literal. Spaces between the tokens are treated as space characters in the string. To prevent `unexpected’ results, any ” or \ characters within the new string literal are preceded by \.
This example demonstrates the feature:
#define MESSAGE(x) printf("Message: %s\n", #x)
MESSAGE (Text with "quotes");
/*
* Result is
* printf("Message: %s\n", "Text with \"quotes\"");
*/
———
#define REPLACE some replacement text #define JOIN(a, b) a ## b JOIN(REP,LACE) /* becomes, after token pasting, REPLACE */ /* becomes, after rescanning some replacement text */
———
The suppression of replacement is only if the macro name results directly from replacement text, not the other source text of the program. Here is what we mean:
#define m(x) m((x)+1) /* so */ m(abc); /* expands to */ m((abc)+1); /* * even though the m((abc)+1) above looks like a macro, * the rules say it is not to be re-replaced */ m(m(abc)); /* * the outer m( starts a macro invocation, * but the inner one is replaced first (as above) * with m((abc)+1), which becomes the argument to the outer call, * giving us effectively */ m(m((abc+1)); /* * which expands to */ m((m((abc+1))+1);
If that doesn’t make your brain hurt, then go and read what the Standard says about it, which will.
———
#define TEST(x) if(!(x))\
printf("test failed, line %d file %s\n",\
__LINE__, __FILE__)
There’s only one minor caveat: the use of the if statement can cause confusion in a case like this:
if(expression)
TEST(expr2);
else
statement_n;
The else will get associated with the hidden if generated by expanding the TEST macro. This is most unlikely to happen in practice, but will be a thorough pain to track down if it ever does sneak up on you. It’s good style to make the bodies of every control of flow statement compound anyway; then the problem goes away.
[Mike Banahan, Declan Brady and Mark Doran, The C Book, http://publications.gbdirect.co.uk/c_book/]
———RegExp———
Windows text files use \r\n to terminate lines, while UNIX text files use \n.
When editing text, doubled words such as “the the” easily creep in. Using the regex \b(\w+)\s+\1\b in your text editor, you can easily find them. To delete the second word, simply type in \1 as the replacement text and click the Replace button.
There is a difference between a backreference to a capturing group that matched nothing, and one to a capturing group that did not participate in the match at all. The regex (q?)b\1b. q? is optional and matches nothing, causing (q?) to successfully match and capture nothing. b matches b and \1 successfully matches the nothing captured by the group. will match
The regex (q)?b\1 however will fail to match b. (q) fails to match at all, so the group never gets to capture anything at all. Because the whole group is optional, the engine does proceed to match b. However, the engine now arrives at \1 which references a group that did not participate in the match attempt at all. This causes the backreference to fail to match at all, mimicking the result of the group. Since there’s no ? making \1 optional, the overall match attempt fails.
^\d*$ would successfully match an empty string. Let’s see why.
There is only one “character” position in an empty string: the void after the string. The first token in the regex is ^. It matches the position before the void after the string, because it is preceded by the void before the string. The next token is \d*. As we will see later, one of the star’s effects is that it makes the \d, in this case, optional. The engine will try to match \d\d into a zero-width success. The engine will proceed with the next regex token, without advancing the position in the string. So the engine arrives at $, and the void after the string. We already saw that those match. At this point, the entire regex has matched the empty string, and the engine reports success. with the void after the string. That fails, but the star turns the failure of the
\b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b.
With the question mark, I have introduced the first metacharacter that is greedy. The question mark gives the regex engine two choices: try to match the part the question mark applies to, or do not try to match it. The engine will always try to match that part. Only if this causes the entire regular expression to fail, will the engine try ignoring the part the question mark applies to.
The effect is that if you apply the regex Feb 23(rd)? to the string Today is Feb 23rd, 2003, the match will always be Feb 23rd and not Feb 23. You can make the question mark lazy (i.e. turn off the greediness) by putting a second question mark after the first.
regex engine does not permanently substitute backreferences in the regular expression. It will use the last match saved into the backreference each time it needs to be used. If a new match is found by capturing parentheses, the previously saved match is overwritten. There is a clear difference between ([abc]+) and ([abc])+. Though both successfully match cab, the first regex will put cab into the first backreference, while the second regex will only store b. That is because in the second regex, the plus caused the pair of parentheses to repeat three times. The first time, c was stored. The second time a and the third time b. Each time, the previous value was overwritten, so b remains.
[Jan Goyvaerts, http://www.regular-expressions.info]
———MATLAB———
try-catch is much faster than programmatic logic testing. This is true in general, and is not specific to MATLAB (i.e., it also holds true for Java, C++, ). This is because at the machine-code level, try instrumentation translates into a single trap/wait (or equivalent) instrumentation opcode taking microsecs, whereas actual programmatic testing may take millisecs or longer. [Yair Altman, http://undocumentedmatlab.com/blog/editormacro-assign-a-keyboard-macro-in-the-matlab-editor/]
———SHELL———
Frequency of word usage in file
cat file | deroff -w | tr A-Z a-z | sort | uniq -c | sort -rn
The deroff -w filter divides the text into words, one per line (at the same time removing any troff commands that may be present), tr A-Z a-z converts all words to lower case, sort sorts the list, uniq -e converts repeated lines into a single line preceded by a count of how many times the line occurred and sort -rn sorts on the numeric count field in reverse order
(largest. to smallest). [Nick Higham-Handbook of Writing for the Mathematical Sciences-p.220]
———
Running processes in the background from MATLAB
unset DISPLAY nohup matlab < programFile.m > programStdOutput.out 2> programStdError.err < /dev/null &
If an interactive command is executing for a long time, and you did not “nice” it, you can suspend it and it and modify it with “nice.”
Suspend: C-z
Find PID: ps aux |grep programName
Alter priority of the corresponding process: renice -p PID_programName
[Shadlen Lab, http://www.shadlen.org/HowTos/Matlab]
———
An event designator is a reference to a command line entry in the history list.
!
Start a history substitution, except when followed by a space, tab, the end of the line, `=’ or `(‘ (when the extglob shell option is enabled using the shopt builtin).
!n
Refer to command line n.
!-n
Refer to the command n lines back.
!!
Refer to the previous command. This is a synonym for `!-1′.
!string
Refer to the most recent command starting with string.
!?string[?]
Refer to the most recent command containing string. The trailing `?’ may be omitted if the string is followed immediately by a newline.
^string1^string2^
Quick Substitution. Repeat the last command, replacing string1 with string2. Equivalent to !!:s/string1/string2/.
!#
The entire command line typed so far.
[Bash Manual Reference, http://www.gnu.org/software/bash/manual/html_node/Event-Designators.html]
———
Consecutively executing programs
g++ prog1.cpp ; ./a.out ; gdb a.out
or
g++ prog1.cpp && ./a.out || gdb a.out
won’t run ./a.out unless prog1.cpp compiles properly and will only run gdb on the executable if a.out terminated abnormally
———
Execute by number:
history | grep 2000 !2000
———
Build dynamic link to a particular version of gcc
cd /usr/bin ln -s gcc-install-path/bin/gcc gcc-3.4.5
———
This recursively finds all files ending with .cpp and ensures a newline is at the end of the file.
find . -iname "*.cpp" -exec perl -ni -e 'chomp; print "$_\n"' {} \;
sed -i 's/_finite/finite/g' *.cpp
———
Add a user specifying his home directory, groups, correct bash, and a dozen privileges
sudo useradd geek --home-dir /home/geek --create-home --gid admin --groups 4,20,24,25,29,30,44,46,107,109,115,124 --shell /bin/bash
———
In the GNU Bash shell the command
ulimit -c
controls the maximum size of core files. If the size limit is zero, no core files are produced.
———EMACS———
LISP command to add to .emacs for .cu files so that they are treated like .cpp files
(setq auto-mode-alist (append
'(("\\.cu$" . c++-mode))
auto-mode-alist))
———Makefile———
These are the “core” automatic variables:
$@ The filename representing the target.
$% The filename element of an archive member specification.
$< The filename of the first prerequisite.
$? The names of all prerequisites that are newer than the target, separated by spaces.
$^ The filenames of all the prerequisites, separated by spaces. This list has duplicate filenames removed since for most uses, such as compiling, copying, etc., duplicates are not wanted.
$+ Similar to $^, this is the names of all the prerequisites separated by spaces, except that $+ includes duplicates. This variable was created for specific situations such as arguments to linkers where duplicate values have meaning.
$* The stem of the target filename. A stem is typically a filename without its suffix. (We'll discuss how stems are computed later in the section “Pattern Rules.'') Its use outside of pattern rules is discouraged. [Robert Mecklenburg, Managing Projects with GNU Make]
