The Rough Guide to the C Programming Language

The Rough Guide to the C Programming Language

This page is dedicated to all those who will never really be able to work out the differences between pointers and a subset of the positive integers. C was first developed by AT&T and many new features and programmes were created at the University of Berkely in California.

Historical Context

A compiler translates a human readable language to machine instructions. Perhaps the first compiler was Ada, Countess of Lovelace. She compiled the instructions for calculating Bernoulli numbers on Babbage's analytical engine. These machine instructions comprised addition, multiplication, subtraction, division and moving numbers between registers. It's not so much different today, except that compilers are programs and the human translators are called 'systems analysts'.

Nowadays the process is that the system analyst translates user specifications to orders for hardware and software, often involving bribery and back-hand payments, and also translate ill considered wish lists to action plans which are then handed to programmers to translate into machine computer readable instructions with the assistance of compilers and interpretors.

          Analyst       Programmer + compiler
         Wishlist  ->   Action plan  ->  Computer instructions.

Sometimes the problems arising from ill considered specifications will be treated in a systematic fashion by people working for the love of it. Many useful image conversion programs and other software tools arise this way.

There are cases when the writing of software can go badly wrong and inadequate and erroneous systems or malicious code can proliferate onto many machines. Both virus and delayed and cancelled projects are symptoms of the inherent contradictions in monopoly capitalism.

Media elites are often afraid of those with the skills and dedication required to work on file and format conversions. American lawyers tried to get criminal proceedings and confiscations against a Norwegian schoolboy who wrote a program to convert the DVD format to ordinary video.

A compiler translates a source language into machine code. The machine code must be installed and run by another program, normally the operating system. Nowadays compilers are used to write about 99.9 percent of the operating system.

An interpretor reads a source language and translates statements and runs them on the fly. Interpreted languages always run much slower than compiled languages. In the past interpreted languages were much more interactive than compiled languages. Nowadays compilers come with interpreted versions for development use.

  Algol 60     Committee                       1960
  CPL          Cambridge & London Universities 1963
  BCPL         Martin Richards                 1967
  B            Ken Thompson, Bell Labs         1970
  C            Dennis Ritchie, Bell Labs       1972
  C++                                          ??
  C#                          Microsoft        2000

Algol is an acronym for algorithmic language while CPL stands for combined programming language. BCPL is CPL prefixed by Basic. C++ and C# are self describing.

Algol 60 introduced Backus-Naur metalanguage for specifications. This was a great advance because the full language definition only took about five pages. The important thing is that the Backus-Naur metalanguage makes extensive use of circular references. This description was used extensively by Japanese PC manufacturers in the early 1980s.

Most recent languages such as PERL, Java and Javascript borrow extensively from the syntax of C.

The first C program that most people learn is quite short.

#include <stdio.h>
main() {
  printf("Hello World\n");
  }

Lexical Structure

C is an Ascii language; that is to say it is a written language with 95 possible characters. C has features in common with many other languages.

  • Comments can be distinguished from the program.
  • Spaces are needed to separate words of the program.
  • Strings enclosed in quotes
  • Numbers are digit strings including optional decimal point
  • Identifiers are alpha-numeric strings starting with a letter.
  • Operators are symbols such as '+','-','*','/', etc.
  • Brackets such as '(',')','[',']' come in pairs.

Comments are introduced by '/*' and terminated with '*/'. White space consists of blanks and horizontal tab characters. C uses the backslash character to extend statements over more than one line, but it is seldom necessary to do this.

In fact identifiers or names may also include the underscore character '_'. There are also widely observed conventions about names. Names starting with underscore are often used in system libraries, and names consisting exclusively of upper case letters are often used to denote symbolic constants or macros defined in #include files. The backslash character is special; its main use is the creation of non-printable characters. Typical examples are:-

  \b          back space      0X08  010     8
  \n          newline         0X0B  013    11
  \r          carriage return 0X0D  015    13
  \t          horizontal tab  0X09  011     9

C Syntax

Shortened Backus-Naur definition of C. This is environmentally friendly means of presenting definitions because circular definition saves paper. The meta-character '|' means or. The string '::=' means the same as. A definition such as “B::= 0 | 1 | B 0 | B 1” means a string consisting of the characters 0 and 1 with at least one character.

      See <ctype.h>

      Digits
      digit ::= '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
      also write as 0-9.

      <empty> ::= Empty string. No characters.

      Letters:
      A::= A-Z and a-z        Upper or lowercase letter
      C::= A | '_'            Letter or underscore.
      X::=                    any printable character
      e::= 'E'|'e'            Exponent character
      .::= '.'                Decimal point
      I::= C | I C | I D      C-symbol
      S::= empty | empty X

      Operators

      f::= '+' | '-' | '*' | '/' | '%' | '<' | '>' | '&' | '~' | '?'
           '^' | '!' | '|' | '=='| '<=' | '>=' | '!=' | '||' | '&&'

      Separators. Like punctuation.

      P::= '(' | ')' | '[' | ']' | ',' | ';' | '{' | '}'

      T     Text
      T::=  "S"                               Double quotes

      N     Numeric literal
      UI::= 'X'                               Character constant
      UI::= digit | UI digit                  Unsigned integer
      int::= + UI | - UI                      Signed integer
      R::=  int . digit | R digit             Decimal number
      N::=  int | R | R e int                 General type

      Expression
      E::=  D | T | I | I (E-list) | (E) | + E | E + E | L=E
      E-list::=  empty | E | E-list , E
      +::=  f | I             Function or C-symbol
      -::=  ++ | --           Prefix/Postfix operator
      f::=  Primitive function, or operator
      D::=  N | T             Data
      E::=  I                         Identifier
      E::=  E.E  | E->E               Structure selection
      E::=  +E                        Monadic operator
      E::=  E+E                       Dyadic  operator
      E::=  (E)                       Priority of evaluation
      E::=  E[E]                      Indexing.
      E::=  L=E                       Assignment
      E::=  L f= E                    L = L f E
      E::=  - L | L -                 Prefix / postfix operator
      L::=  I | I[E]                  L-Value

An identifier consists of a sequence of letters, digits, and the underscore (_). The starting character must be either a letter or underscore.

A numeric token is a sequence starting with a digit or sign and including an optional decimal point and an exponent string consisting the letter 'E' or 'e' followed by a number which may have a sign (+ or -). Because the minus sign, '-', is used as an arithmetic operator the

Text literals consist of any string between double quotes (“). If the double quote is itself to be included in a string then it must be escaped with backslash (\).

Statements consist of one or more expressions, separated by commas. These expressions are evaluated in sequence and the result of the statement is the last value calculated. Normally evaluation proceeds from right to left with certain rules for priority of evaluation. A compound statement is a sequence of statements followed by semi-colons (;). A series of statements may be made into a function. Functions must have an argument list consisting of zero or more identifiers enclosed in parenthesis. The first line of the function acts as a template. Functions are defined in source files. There is a way of sharing variables between some source files but excluding them from others.

A function definition consists of two parts: the function header, and the function body. Notice that <BODY> has itself become a keyword in HTML.

Here <empty> stands for a null text string, that is to say a text string with possibly no characters at all. The symbol 'I' stands for an identifier as defined in the section on syntax.

Data Types & Storage Classes

C is a typed language. That means that the programmer must specify in great detail every item of data to be manipulated by the program. If the programmer does not want to do this the C compiler will make an assumption that the data is of integer, or number type.

Primitive data types are normally chunks of from 1 to 16 bytes, and these reflect the underlying computer hardware. To the programmer the most frequent are 'char', 'unsigned char', 'int', 'unsigned int', 'long', 'short','float','double', 'long double' and 'long long'. These data types are meant to be machine independent. The same C source code should work irrespective of the byte-sex of the target machine. Logical values, representing true or false may be represented as almost any of these data types, with the convention that zero means false.

Many text books show drawings of memory and memory locations to describe how things are stored in the computer.

  Bits are numbered in a byte. There are two conventions.

      .--.--.--.--.--.--.--.--.
      | 0| 1| 2| 3| 4| 5| 6| 7|
      .--.--.--.--.--.--.--.--.

      .--.--.--.--.--.--.--.--.
      | 7| 6| 5| 4| 3| 2| 1| 0|
      .--.--.--.--.--.--.--.--.

Bytes are numbered within a sixteen bit word. When the word is used for arithmetic there are two distinct conventions. The most significant byte may come either before or after the least significant byte in the computer's address space. These two conventions are both used. Intel chips store the least significan byte first. Motorala chips generally do the opposite.

       Intel            MC68000
      .--.--.           .--.--.
      |LO|HI|           |HI|LO|
      .--.--.           .--.--.

32-bit numbers may be stored in several formats. If a number N is expressed as four bytes N= b0 + b1*256 + b2*256^2 + b3*256^3 then the two most common storage forms are:

          Intel          Motorala
      .--.--.--.--.   .--.--.--.--.
      |b0|b1|b2|b3|   |b3|b2|b1|b0|
      .--.--.--.--.   .--.--.--.--.

      .--.--.--.--.
      |b2|b3|b0|b1|     Occasional alternative form.
      .--.--.--.--.

These different arrangements are called 'byte-sex'. For most applications it is not necessary to know the byte sex, but there are some graphic file formats which are not 100% convertible between different machines.

  Example: Storage of the string "Hello\n" in 16-bit computer.
       Address  Data
       .--.--. .--.
       |01|00| |H |  Letters are used instead of ascii values.
       .--.--. .--.
       |  |01| |e |
       .--.--. .--.
       |  |02| |l |
       .--.--. .--.
       |  |03| |l |
       .--.--. .--.
       |  |04| |o |
       .--.--. .--.
       |  |05| |0A|  Line feed.
       .--.--. .--.
       |  |06| |00|  Strings end with a zero byte.
       .--.--. .--.

These drawings may be useful, but they don't always show that there is never really enough memory for the cutting edge state of the art programs that your competitors are running with no problems. Byte sex comes from how individual bytes make up longer integers used in arithmetic.

We also know that public funded projects are always running their software on computers without enough memory because of budget cuts or occasionally outright corruption.

The model for C and for any other programming language designed for a Von-Neumann machine is straightforward. Address space is a subset of the positive integers. Address space is conceptually divided into space taken by program instructions and space taken for the storage of user data. Instructions and data look the same, and share this address space.

A pointer type represents the location of something in the computer's memory. While the language syntax of using pointers is very precise the results of using pointers are extremely problematic. There is no effective way of finding errors in programs which misuse pointers. The best thing to do is to write programs which are always correct. C has many facilities to help people do this. Most compilers will print warnings when pointers are incorrectly used and many will terminate after a certain threshold of warnings is printed. Sometimes excessive error checking is quite annoying, but compiler checking is even worse in Pascal or Modula 2.

Data types are recognised by printf. Most common at top of list.

 'c'        A single character
 'd'        A signed integer
 's'        A 'NULL'-terminated string.
 'p'        A pointer.  This is printed with an 'x' specifier.
 'f'        Fixed float.  Use "%w.pf"
 'e' 'E'    floating point number with exponent
 'g''G'     General. Use exponent if necessary.
 'i'        A signed integer.
 'o'        Octal. Integer printed in base 8 instead of base 10.
 'D'        A signed long integer
 'u'        An unsigned integer.
 'U'        An unsigned long integer.
 'h'        short, 'l' long ints, or 'L' long doubles.
 'x' 'X'    Hexadecimal, integer in base 16 instead of base 10.

Storage classes are important. In C these are given the names 'auto', 'extern', 'global' and 'static'. Auto variables are those which represent on-stack storage. Global variables are those which can be accessed by all programs in the system. By the time C was invented large programs were always dying because one routine inadvertantly corrupted data used by another. C generally requires programmers to name variables which may be used by other routines. A variable is defined as normal in just one source file, and other routines which require this variable must prefix its name and type with the keyword 'extern'. Almost all other programming languages have a mechanism for using global variables on a 'need to know' basis just like a series of terrorist cells only divulge minimum necessary knowledge to the acolytes.

Arrays And Tables

In C it is possible to define arrays of data elements. Both the definition and use require the square brackets '[' and ']'. The length of an array is the number of elements. C keeps things simple. Only one dimension is allowed and subscripts start with zero.

  Array declarations can be defined in Backus-Naur notation.

  type        ::= 'enum' | 'char' | 'int' | 'float' | 'double'
  qualifier   ::= <empty> |  'unsigned' | 'short' | 'long'
  qualifier   ::  qualifier qualifier
  dimension   ::= <empty> | [] | [Numeric-literal]
  dimension   ::= dimension dimension
  tabulator   ::= dimension | '*'
  declaration ::= qualifier type name dimension;

The size of an array must be known at compile time. If it is not known then the pair of square brackets may be used to indicate an array.

  Example:

  char greeting[] = {'H','e','l','l','o','\0'};

This declares an array of six characters, with initial values. In fact a shorter way of getting a similar definition is char* greeting=“Hello”;

The use of arrays or tables is necessary in many programming applications. The maths technique of selection without replacement is perhaps one of the most common mass market applications. Many people want algorithms to select lottery numbers or even generate pin numbers.

C And Generating Lottery Numbers

Full worked example.

A 'National Lottery' requires a user to select N different numbers from 1 to M without replacement. The English National Lottery requires six numbers from 49 to give a the jackpot but the gambler must select seven numbers when filling in the form. The program is to be called select.c and it accepts two numbers on input, given on the command line. The program makes an apparently random selection. The program may seem long and complicated but there are many people who want to know how such processes work. For a start the model of a jar with coloured balls and random selection is common in applications such as quantum theory, cryptography and monitoring the stock market. It is also an application where there is no simple method of achieving the result without using arrays and subscripting.

lottery.c
  /*      select n numbers from {1,2,...m}  based on time() */

  #include <stdio.h>
  #include <math.h>
  #include <time.h>

  #define MAX_BALLS       1000

  /* hold numbered balls in an urn or jar */

  int     urn[MAX_BALLS];

  main(argc, argv)
  int argc;
  char **argv;
  {
  int     n, m;
  int     i, k;

  if (argc < 3) {
      printf ("Usage %s number-to-select number-of-lottery-balls \n");
      return -1;
      }

  n = atoi(*++argv);
  m = atoi(*++argv);

  /* sanity check */
  if (n > MAX_BALLS || m > MAX_BALLS || n > m || n < 0 || m < 0) {
  /* brush off user with a cryptic error message */
      printf("arguement error. Don't bet today\n");
      return -1;
      }
  printf ("Selecting %d balls from %d without replacement\n", n, m);

  /* deterministic randomize */
  srand((unsigned)time(NULL));

  for (i=0; i<m; i++) urn[i]=0;
  for (i=0; i<n; i++) {           /* select n balls */
       for (k = rand() % m; urn[k]; k = rand() % m);
       urn[k]=1;
       }
  for (i = 0; i < m; i++) {
      if (urn[i]) printf("%d ", 1+i);
      }
  printf("\n");
  return  0;
  }

Notes:

(1) argv, argc is a standard C-method of taking parameters from the command line. It allows the program to be used in different contexts.

(2) atoi() is a library function converting an ascii string to an integer. Its type definition is in one of the include files.

(3) The line srand1) uses many of the features of C. The expression '(unsigned)' is a called a cast. It forces the result of the library function 'time()' to be treated as an integer. This value is fed into the standard library function called 'srand()' which sets the pseudo random number stream. Donald E. Knuth has written extensively on random number generators in his opus The 'Art of Computer Programming'. These chapters are some of the easiest and most interesting to read. Getting a better method of randomisation is tedious. Some people use mouse movements to set up an entropy pool, but identical robots might generate the same seed value.

(4) The nested 'for' loops do all the work. The status of the array 'urn' tells the full story. Values of urn[x] ar 0 or 1 depending on whether x has already been selected.

(5) The selected values are printed in ascending order. For winning the lottery the order in which the numbers are selected is not important.

To compile the program type a command line:

  gcc lottery.c  -o lottery.exe       Dos/Windows
  gcc lottery.c  -o lottery           Linux

  To run the program.

  lottery     7 49                    Dos

  chmod a+x lottery                   Linux
  ./lottery

  Sample result:
  lottery 7 49
  Selecting 7 balls from 49 without replacement
  12 24 29 38 39 40 45

Operators

The first set of operators can be found on any calculator. These are the four operations of arithmetic, plus '+', minus '-', times ;*', and divide '/'. The next most common is the equals sign '=', for moving things about registers. In C the operators consist of one or two special symbols. Operators can be strung together with names rather like the symbolic expressions taught in algebra classes at schools. These are infix operators: that is they are used in a context such as x+y where the operator is written between the data to which the operator is applied. There is another important operator called assignment written '='. The expression a=b; means store b into a.

The most common arithmetic operators work on characters, integers, floating point values, and sometimes they work in mixed expressions with pointers and numeric values. The value of the result depends on the types of the operand.

  +   add
  -   subtract
  *   multiply
  /   divide
  %   remainder

When these operators are used in assignment statements such as “x = x + a”, then the assignment operator pair may be shortened to “x += a”, etc. The very first C compilers accepted “x =+ a” but this was only in the 1970s.

Relational operators are similar to arithmetic operators. They are written between a pair of data values and the result is a logical value true, or false. The most important relational operators are

  <   less than
  >   greater than
  <=  less than or equal
  >=  greater than or equal
  !=  not equal
  ==  equal

All of these work on characters, numbers and pointers.

C also has operators that work on the individual bits of bytes and integers. These need to be used with care because there are many places for the compiler writer to go wrong when dealing with mixed data types.

  &   logical and
  |   logical or
  ^   exclusive or
  ~   unary not.      Invert each bit.

C has the powerful concept of letting some operators stand for complex logical constructions. '&&', '||' and '?' represent control structures. Truth values represented by zero and non-zero may be manipulated by logicise operators.

  &&  logical and
  ||  logical or
  !   logical not

The designers of C were very clever in their specifications. The expression 'A && B' is true if both A and B are true, but if A is false then B is not evaluated at all. Similarly in the condition 'A || B' the result is the logical or of A and B. If A is true then B is not evaluated and the value 'true' is returned for the whole expression.

Control Statements

There are several. The most important are 'if', 'else', 'for', 'while', 'switch', 'return'. The symbols '{' and '}' are used to mark blocks of statements. Every C program must also contain the word 'main'. In fact many C programs may be written using only the words 'int', 'main', 'for' 'printf' and single letter names and operational symbols.

The 'if' statement has the syntax:

  if (condition) action;

Here condition is a stement and action either a single statement or a compound statement enclosed in braces '{' and '}'. The keyword 'if' is just about the most common keyword used in C-programs. The condition part of the expression is evaluated and if non zero the action part of the statement is evaluated. It is also possible to specify an alternative course if the condition is false: use the keyword 'else'.

  if (condition) action1;
  else           action2;

A series of conditions and actions can be chained together with if and else. Conditional statements may be embedded into the actions by use of braces.

Example:

  if (a==0) {if (b>0) c=b; else c=-b;}
  else if (a==1) b="foo";
  else b = NULL;

It is important to distinguish between the operator '=' for assignment and '==' as a comparison operator. The effect of a line such as:-

  if (a=1) die();

is to set the value a=1 and use the value 1 or 'true' in the conditional expression. This means that the function die() is _always_ invoked in this code fragment. This misuse of '=' is one of the most common sources of error.

It is also possible to write simple conditionals with the '?' and ':' operators. The syntax is:-

  condition ? action_1 : action_2;

Here condition and action_1/2 are simple statements.

Example: Print out a random maze.

  int a[1817];main(z,p,q,r) {for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)
  q=3&(r=time(0)+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79
  :0:p>158?-79:0,q?!a[p+q*2]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)
  printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

For iteration the most important is the 'for' statement.

  for (start; condition; iteration) {body;}

Here any of the three statements 'start', 'condition', and 'iteration' may be empty. 'body' may also be omitted. Since statements may be expressions separated by commas it is easy to see that a single for loop may involve substantial programming effort. To print numbers 1 to 10 use the loop

  for (i=1; i <= 10; i++) printf("%d\n", i);

Example: 800 digits of Pi.

  int a=10000,b,c=2800,d,e,f[2801],g;main(){for(;b-c;)f[b++]=a/5;
  for(;d=0,g=c*2;c-=14,printf("%.4d",e+d/a),e=d%a)for(b=c;d+=f[b]*a,
  f[b]=d%--g,d/=g--,--b;d*=b);}

It is also possible to use a simple form of iteration with 'while'. The syntax is:-

  while (condition) body

Here body is a single statement, or a compound statement within braces. There are also other iteration constructions using keywords such as 'do', 'until' and 'repeat' etc. but in practice they are not used so often.

The body of a loop is often a series of statements and there are keywords to indicate whether all of these statements are performed on each iteration. Use 'break' to force early exit from the loop and use 'continue' to skip a series of statements.

Example: find the position of an integer k in an array A of size N. If the value is not present return N.

  int A[N], i, k;
  for (i=0; i<N; i++)
      if (k==A[i]) break;
  .... more

C also has a 'goto' statement. Since about the 1960s the use of 'goto' has been discouraged. Dijkstra, one of the pioneers of proving the correctness of programs, showed that programs extensively using this construct could almost never be proved to be correct. The 'goto' instruction requires the definition of labels. These are normally placed at the beginning of a line, and a label consists of a name followed by a colon. It is possible to rewrite the above example.

  int A[N], i, k;
  for (i=0; i<N; i++)
      if (k==A[i]) goto MORE;
  MORE: .... more

Multiple conditions may be tested in a 'switch' statement. Switch makes use of a control variable, and individual values of this variable are tested. The syntax is:-

  switch (condition) {
  case value_1:       action_1;
      break;
  case value_2:       action_2;
      break;
  .....
  case value_n:       action_n;
      break;
  default:
  }

Here condition is an statement returning an integer. The values to be tested consist of numeric literals. The 'break' keyword is used to indicate exit from the condition. If omitted program control passes to the action in sequence.

These simple control structures serve most programming needs but they are not very good for writing mouse drivers and suchlike.

Functions

Functions are logical divisions of the program. Normally functions return a value. A function consists of a header line, then statements between braces '{' and '}'. Parameter passing sometimes looks very obscure. This is especially so with varargs functions. Right from the beginning C contained some very powerful functions: 'fork' would clone a running version of the same program.

Example:

  add(a,b){return a+b;}

  printf("2+5=%d\n", add(2,5));

  2+5=7

The word int is supplied by the compiler for both the return value and arguement types.

The function header has evolved since C was invented. At the beginning the function header assumed a return type of 'int', and the parameter types were specified in a list after the closing parenthesis. The function add would be written in full:

  int add (a, b)
  int a;
  int b;
  {
      return a+b;
  }

Later it was decided to allow type specifiers within the parentheses.

  int add(int a, int b)
  { return a+b;}

Within the body of a function it is possible exit by using the 'return' statement. 'return' is normally followed by an expression, which is the value used by the calling program. It is not always necessary to use the return value, and sometimes it is undesirable to do so.

Function definition tells the compiler the number and the types of the parameters. The use of a function consists of writing its name followed by a arguement list enclosed in parantheses. The arguements are separated by commas.

  Example:

  #include <stdio.h>

  main()
  {
  int x = 2;
  int y = 3;
  printf ("x=%d y=%d x+y=%d\n", x, y, add(x, y));
  return 0;
  }

  int add(int a, int b) {return a+b;}

Here the function add() is defined in the program, and printf() is a library function. Printf itself is written in C and different C compilers provide slightly different versions of this function which vary in functionality. The most important thing about printf is that it has a variable number of arguements.

When a program is compiled it may not always be possible to tell if the number of arguements used in function call are the same as those provided for in function definition. One of the first ever programs written in C did this sort of checking. It is called 'lint', with the idea that lint is the sort of fluff that accumulates on bedding etc. and is host to potentially lethal dust-mites, pathogenic fungi and animal droppings. Lint scans C source files and reports possible misuse of functions and pointers. Nowadays hardly anyone uses it. Lint often produces voluminous output with many spurious warnings. Compilers have also improved over time and asking for all warnings will give lots of hints about potential causes of difficulty in the program.

C was designed to use certain conventions in parameter passing.

  • single bytes are converted to 'int'.
  • floating point values are converted to 'double'.
  • arrays are passed by reference.

These concepts are always hard to learn for the first time. It usually takes several weeks to really understand what is going on. There are many widely used programming languages that none of the flexibility of C while other languages have far more elaborate schemes for defining and calling functions.

In practical terms the conversion of 'float' to 'double' at interfaces means that a programmer can always use 'double' for large numbers and forget about the 'float' data type. Similarly routines which take single characters as arguements can be coded to expect an 'int'.

Pointers

In C pointers are references to memory locations, and therefore are not meant to be the same as integers. In fact pointers share many properties with unsigned integers.

  • Zero is special. It is called NULL.
  • Pointers are totally ordered.
  • Some pointers have successors and predecessors.
  • Incorrect operations on pointers crash the program.

A set S is totally ordered if for any two members a,b in S at least one of the statements a>=b or a⇐b is true. The NULL value of a pointer can be used as 'false' in logical expressions.

The '&' operator gets the pointer corresponding to a data item, while the '*' operator gets the value corresponding to a pointer. The double use of '&' and '*' can be parsed because in the context of pointer reference '*' and '&' are unary operators.

Almost everyone finds pointers hard to understand at first.

Sometimes a programmer wants to use a pointer which can serve any data type. To do this use the keyword 'void'.

The address space for a program is modelled as a set of numeric ranges representing positions of instructions and data in the computer. Normally this is not contiguous: there is more than one range. Instructions that reference sensitive parts of address space such as input-output buffers or a memory mapped video are usually hidden in library functions.

Structures & Classes

Structures are ways of building up more complex data structures from the atomic classes and pointers. They are introduced by the keyword 'struct'. Unions represent areas of memory which may contain different structures at different stages of processing. The length of every structure may be determined by the 'sizeof' operator.

A class is a collection of definitions of data and functions.

A structure is introduced by the keyword 'struct'. The syntax is:-

  struct name_of_structure {
      type_1  member_1;
      type_2  member_2;
      .....
      type_n  member_n;
      };

Input & Output

C programs assumed three channels: standard input, standard output and standard error. These things are normally fairly complicated so details may be hidden in <stdio.h> where some things which look like functions are really macros. In particular getting single keystrokes from the user is a nightmare because it may be necessary to time the user out, or accept keyboard input when the program wants to do something else. Getting mouse pointer and button signals is even worse.

Writing C programs with bulk input and output is not difficult.

'printf' serves for almost all output while 'getc' or 'fgets' serve to provide for nearly all input possibilities. Highly structured data such as the output of programs can be read with the 'scanf' function.

ioctls, termios and sockets provided all necessary tools for building the internet.

Unix & Linux

UNIX introuced C and the idea of input output redirection and pipes. UNIX was made to be an intereactive operating system where every user could call for operating system resources. Most management was very hostile to this idea at the time.

Simple sequences of programs can be entered from the command line.

Example:

  vi hello.c
  cc hello.c -o hello
  chmod a+x hello
  ./hello

Management thought the data on the computer might get corrupted if programmers actually got access to the computer. Remember that Ada actually died before she got access to the machine. Worse still, her name has become a language ADA mostly used in sinister military applications.

UNIX was developed so that AT&T could print better manuals and also to allow many people to access the same computer simultaneously. Three levels of privacy were guaranteed right from the beginning.

This is reflected in the file system. The permissions heirarchy is divided into owner, group, and all. The unwary user will find that most problems are caused by getting the permissions wrong.

C Libraries And Include Files

Most large systems such as UNIX, LINUX, X-Windows, have hundreds of include files and more or less standard libraries with subtle and insidious differences between different versions of operating systems and compilers. Many of these differences are not documented and only discovered when someone is trying to convert code from one machine to another, and working to a very strict deadline.

A particularly shocking example is memcpy(src, dst, size) where certain Microsoft Compilers caused a 64 kilobyte block move when a size of zero is used.

Libraries are collections of functions in files which may be run by several programs. Libraries need linking to these programs and the links may be either static or dynamic. With the internet the libraries may reside on a different machine to the program and their correctness cannot be deduced until the program is actually run. Big problems with new programs written for old libraries and vice versa.

Compilation Process. Make, Imake & Ide

The edit, compile and run cycle has been described for a LINUX user with access to a shell window. When more complicated programs are developed it may be important to keep a record of how programs are compiled and linked. This is done with Makefiles. MAKE is a program that checks the sources of a given program and whether any of them are more recent than the program itself. If this is the case then make rebuilds the program by compiling the necessary routines. With large systems, including almost all window managers the maintenance of Makefiles became increasingly tedious and many new languages have been invented to generate Makefiles. Another tendency has been the replacement of Makefiles by Project files with INTEGRATED DEVELOPMENT ENVIRONMENT (IDE).

The IDE requires a very thick book with hundreds of screen diagrams to describe. It also eats up space and it is not essential for the operation of the compiler.

Getting A Job As A C Programmer

Now you have read this far go out and buy Donald E Knuth's opus, The Art of Computer Programming (Addison Wesley). Read at least three volumes and send your resume to Bill Gates. Bill will be pleased to hear from you.

Installing A C Compiler

Most computers don't come with a C-compiler. It's necessary to install the software via the internet, or from a CD-Rom. Most distributions allow for the installation of C, C++ and objective C. It's possible to pay for a compiler, or get one for free. The free versions are generally well tested. The compiler and necessary software will usually take over 10Gb, and comprise hundreds of files. Generally the C-compiler itself was used to write all of the software that gets installed here. The main components are the compiler and assembler, library manipulation utilities, include files and documentation. A minimal working set will include packages such as gcc3, libc, kernal headers and man-pages.

Appendix 1: C Keywords

Here is a list of keywords sorted by frequency from C source for the editor which was used to type this document.

   if          1250|#include     102|long          23|#ifndef        5
   int          975|switch       102|sizeof        21|ifndef         4
   return       880|default       90|do            20|short          4
   break        522|double        78|register      13|signal         4
   case         521|char          61|union         11|#if            3
   #define      496|#endif        60|continue      10|global         2
   for          400|#ifdef        53|ifdef         10|void           0
   else         334|unsigned      40|exit           9|static         0
   extern       206|while         34|#elif          8|float          0
   struct       145|#else         26|goto           6|enum           0

Appendix 2: References

Part of this section is taken from Sunil Rao's work. The FAQ also gives details of where to download free C compilers. The DJGPP compiler for DOS/WINDOWS contains very good documentation including a complete specification of all of the standard library functions.

[0] Sunil Rao sunil.rao@ic.ac.uk Learn C, C++ FAQ
http://www.raos.demon.co.uk/acllc-c++/faq.html

Steve Summit http://www.eskimo.com/~scs/cclass/cclass.html
Ted Jensen http://pweb.netcom.com/~tjensen/ptr/cpoint.htm
Tom Torfs http://members.xoom.com/tomtorfs/cintro.html

[1] Kernigan & Ritchie The C-programing Language (c 1976)
http://cm.bell-labs.com/cm/cs/cbook/

[2] D.E.Knuth The Art of Computer Programming. (c 1969 –)
Addison Wesley

K N King “C Programming: A Modern Approach”
http://knking.com/books/c/

H M Deitel and P J Deitel “C - How to Program”, 2nd Edition
http://www.deitel.com/products_and_services/publications/chtp2.htm

1) unsigned)time(NULL

lynplexc/tutorial/c20.txt · Last modified: 2011/01/29 12:52 (external edit)