Please Enter Your Search Term Below:
 Websearch   Directory   Dictionary   FactBook 
  Wikipedia: Tagged union

Wikipedia: Tagged union
Tagged union
From Wikipedia, the free encyclopedia.

In computer science, a tagged union, also called a variant, variant record, or disjoint union, is a data structure used to hold a value that could take on several different, but fixed types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type which has several "cases," each of which should be handled correctly when that type is manipulated. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

Tagged unions are most important in functional languages such as ML and Haskell, where they are called datatypes and the compiler is able to verify that all cases of a tagged union are always handled, avoiding many types of errors. They can, however, be constructed in nearly any language, and are much safer than untagged unions, often simply called unions, which are similar but do not explicitly keep track of which member of the union is currently in use.

Tagged unions are often accompanied by the concept of a constructor, which is similar but not the same as a constructor for a class. Constructors produce a tagged union value, given the initial tag value and a value of the corresponding type.

Mathematically, tagged unions correspond to disjoint unions, usually written using +. Given an element of a disjoint union A + B, it is possible to determine whether it came from A or B. If an element lies in both, there will be two effectively distinct copies of the value in A + B, one from A and one from B.

Advantages and disadvantages

The primary advantage of a tagged union over an untagged union is that all accesses are safe, and the compiler can even check that all cases are handled. Untagged unions depend on program logic to correctly identify the currently active field, which may result in strange behavior and hard-to-find bugs if that logic fails.

The primary advantage of a tagged union over a simple record containing a field for each type is that it saves storage by overlapping storage for all the types. Some implementations reserve enough storage for the largest type, while others dynamically adjust the size of a tagged union value as needed. When the value is immutable, or cannot be changed, it is simple to allocate just as much storage as is needed.

The main disadvantage of tagged unions is that the tag occupies space. Since there are usually a small number of alternatives, the tag can often be squeezed into 2 or 3 bits wherever space can be found, but sometimes even these bits are not available. In this case, a helpful alternative may be folded or encoded tags, where the tag value is dynamically computed from the contents of the union field. A particularly common example of this is the tagged pointer.

Sometimes, untagged unions are used to perform bit-level conversions between types, called reinterpret casts in C++. Tagged unions are not intended for this purpose; typically a new value is assigned whenever the tag is changed.

Many languages support, to some extent, a universal data type, which is a type that includes every value of every other type, and often a way is provided to test the actual type of a value of the universal type. While comparable to a tagged union in some respects, typical tagged unions include a relatively small number of cases, and these cases form different ways of expressing a single coherent concept, such as a data structure node or instruction. Also, there is an expectation that every case of a tagged union will be dealt with when it is used. The values of a universal data type are not related and there is no feasible way to deal with them all.

Examples

Say we wanted to build a binary tree of integers. In ML, we would do this by creating a datatype like this:

datatype tree = Leaf
             | Node of (int * tree * tree)

This is a tagged union with two cases: one, the leaf, is used to terminate a path of the tree, and functions much like a null value would in imperative languages. The other branch holds a node, which contains an integer and a left and right subtree. Leaf and Node are the constructors, which enable us to actually produce a particular tree, such as:

Node(5, Node(1,Leaf,Leaf), Node(3, Leaf, Node(4, Leaf, Leaf)))

which corresponds to this tree:

Now we can easily write a typesafe function that, say, counts the number of nodes in the tree:

fun countNodes(Leaf) = 0
 | countNodes(Node(int,left,right)) =
     1 + countNodes(left) + countNodes(right)

Language support

Although primarily only functional languages such as ML and Haskell give a central role to tagged unions and have the power to check that all cases are handled, other languages have support for tagged unions as well.

Pascal, Ada, and Modula-2 call them variant records, and require the tag field to be manually created and the tag values specified, as in this Pascal example:

type shapeKind = (square, rectangle, circle);
    shape = record
               centerx : integer;
               centery : integer;
               case kind : shape of
                  square : (side : integer);
                  rectangle : (length, height : integer);
                  circle : (radius : integer);
end;

In Algol, tagged unions are called united modes, the tag is implicit, and the case construct is used to determine which field is tagged:

mode ib = union(int, bool)
ib x
case x in
 (int i):   process_integer(i)
 (bool b):  process_boolean(b)
esac

In C and C++, a tagged union can be created from untagged unions using a strict access discipline where the tag is always checked:

enum ShapeKind { Square, Rectangle, Circle };

struct Shape {

   int centerx;
   int centery;
   enum ShapeKind kind;
   union {
       struct { int side; }           squareData;
       struct { int length, height; } rectangleData;
       struct { int radius; }         circleData;
   } shapeKindData;
};

int getSquareSide(struct Shape* s) {

   ASSERT(s->kind == Square);
   return s->shapeKindData.squareData.side;
}

void setSquareSide(struct Shape* s, int side) {

   s->kind = Square;
   s->shapeKindData.squareData.side = side;
}

/* and so on */

Here, ASSERT is a macro that aborts or throws an exception if the condition is not satisfied. As long as the union fields are only accessed through the functions, the accesses will be safe and correct.

C and C++ also have language support for one particular tagged union: the possibly-null pointer. This may be compared to the option type in ML, and has two possible values, either an integer pointer value, or a special null value indicating an exceptional condition. Unfortunately, C compilers do not verify that the null case is always checked for, and this is a particularly prevalent source of bugs in C code, since there is a tendency to ignore exceptional cases.

One advanced dialect of C called Cyclone has extensive built-in support for tagged unions. See the tagged union section of the on-line manual for more information.


  

From Wikipedia, the free encyclopedia. 
Modified by Geona