postgresql/doc/src/sgml/ref/create_type.sgml

<!--
$PostgreSQL: pgsql/doc/src/sgml/ref/create_type.sgml,v 1.70 2007/05/11 17:57:11 tgl Exp $
PostgreSQL documentation
-->

<refentry id="SQL-CREATETYPE">
 <refmeta>
  <refentrytitle id="sql-createtype-title">CREATE TYPE</refentrytitle>
  <refmiscinfo>SQL - Language Statements</refmiscinfo>
 </refmeta>

 <refnamediv>
  <refname>CREATE TYPE</refname>
  <refpurpose>define a new data type</refpurpose>
 </refnamediv>

 <indexterm zone="sql-createtype">
  <primary>CREATE TYPE</primary>
 </indexterm>

 <refsynopsisdiv>
<synopsis>
CREATE TYPE <replaceable class="parameter">name</replaceable> AS
    ( <replaceable class="PARAMETER">attribute_name</replaceable> <replaceable class="PARAMETER">data_type</replaceable> [, ... ] )

CREATE TYPE <replaceable class="parameter">name</replaceable> AS ENUM
    ( '<replaceable class="parameter">label</replaceable>' [, ... ] )

CREATE TYPE <replaceable class="parameter">name</replaceable> (
    INPUT = <replaceable class="parameter">input_function</replaceable>,
    OUTPUT = <replaceable class="parameter">output_function</replaceable>
    [ , RECEIVE = <replaceable class="parameter">receive_function</replaceable> ]
    [ , SEND = <replaceable class="parameter">send_function</replaceable> ]
    [ , TYPMOD_IN = <replaceable class="parameter">type_modifier_input_function</replaceable> ]
    [ , TYPMOD_OUT = <replaceable class="parameter">type_modifier_output_function</replaceable> ]
    [ , ANALYZE = <replaceable class="parameter">analyze_function</replaceable> ]
    [ , INTERNALLENGTH = { <replaceable class="parameter">internallength</replaceable> | VARIABLE } ]
    [ , PASSEDBYVALUE ]
    [ , ALIGNMENT = <replaceable class="parameter">alignment</replaceable> ]
    [ , STORAGE = <replaceable class="parameter">storage</replaceable> ]
    [ , DEFAULT = <replaceable class="parameter">default</replaceable> ]
    [ , ELEMENT = <replaceable class="parameter">element</replaceable> ]
    [ , DELIMITER = <replaceable class="parameter">delimiter</replaceable> ]
)

CREATE TYPE <replaceable class="parameter">name</replaceable>
</synopsis>
 </refsynopsisdiv>

 <refsect1>
  <title>Description</title>

  <para>
   <command>CREATE TYPE</command> registers a new data type for use in
   the current database.  The user who defines a type becomes its
   owner.
  </para>

  <para>
   If a schema name is given then the type is created in the specified
   schema.  Otherwise it is created in the current schema.  The type
   name must be distinct from the name of any existing type or domain
   in the same schema.  (Because tables have associated data types,
   the type name must also be distinct from the name of any existing
   table in the same schema.)
  </para>

  <refsect2>
   <title>Composite Types</title>

  <para>
   The first form of <command>CREATE TYPE</command>
   creates a composite type.
   The composite type is specified by a list of attribute names and data types.
   This is essentially the same as the row type
   of a table, but using <command>CREATE TYPE</command> avoids the need to
   create an actual table when all that is wanted is to define a type.
   A stand-alone composite type is useful as the argument or return type of a
   function.
  </para>
  </refsect2>

  <refsect2>
   <title>Enumerated Types</title>

   <para>
    The second form of <command>CREATE TYPE</command> creates an enumerated
    (enum) type, as described in <xref linkend="datatype-enum">.
    Enum types take a list of one or more quoted labels, each of which
    must be less than <symbol>NAMEDATALEN</symbol> bytes long (64 in a standard
    <productname>PostgreSQL</productname> build).
   </para>
  </refsect2>

  <refsect2>
   <title>Base Types</title>

  <para>
   The third form of <command>CREATE TYPE</command> creates a new base type
   (scalar type).  The parameters can appear in any order, not only that
   illustrated above, and most are optional.  You must register
   two or more functions (using <command>CREATE FUNCTION</command>) before
   defining the type.  The support functions
   <replaceable class="parameter">input_function</replaceable> and
   <replaceable class="parameter">output_function</replaceable>
   are required, while the functions
   <replaceable class="parameter">receive_function</replaceable>,
   <replaceable class="parameter">send_function</replaceable>,
   <replaceable class="parameter">type_modifier_input_function</replaceable>,
   <replaceable class="parameter">type_modifier_output_function</replaceable> and
   <replaceable class="parameter">analyze_function</replaceable>
   are optional.  Generally these functions have to be coded in C
   or another low-level language.
  </para>

  <para>
   The <replaceable class="parameter">input_function</replaceable>
   converts the type's external textual representation to the internal
   representation used by the operators and functions defined for the type.
   <replaceable class="parameter">output_function</replaceable>
   performs the reverse transformation.  The input function can be
   declared as taking one argument of type <type>cstring</type>,
   or as taking three arguments of types
   <type>cstring</type>, <type>oid</type>, <type>integer</type>.
   The first argument is the input text as a C string, the second
   argument is the type's own OID (except for array types, which instead
   receive their element type's OID),
   and the third is the <literal>typmod</> of the destination column, if known
   (-1 will be passed if not).
   The input function must return a value of the data type itself.
   Usually, an input function should be declared STRICT; if it is not,
   it will be called with a NULL first parameter when reading a NULL
   input value.  The function must still return NULL in this case, unless
   it raises an error.
   (This case is mainly meant to support domain input functions, which
   might need to reject NULL inputs.)
   The output function must be
   declared as taking one argument of the new data type.
   The output function must return type <type>cstring</type>.
   Output functions are not invoked for NULL values.
  </para>

  <para>
   The optional <replaceable class="parameter">receive_function</replaceable>
   converts the type's external binary representation to the internal
   representation.  If this function is not supplied, the type cannot
   participate in binary input.  The binary representation should be
   chosen to be cheap to convert to internal form, while being reasonably
   portable.  (For example, the standard integer data types use network
   byte order as the external binary representation, while the internal
   representation is in the machine's native byte order.)  The receive
   function should perform adequate checking to ensure that the value is
   valid.
   The receive function can be declared as taking one argument of type
   <type>internal</type>, or as taking three arguments of types
   <type>internal</type>, <type>oid</type>, <type>integer</type>.
   The first argument is a pointer to a <type>StringInfo</type> buffer
   holding the received byte string; the optional arguments are the
   same as for the text input function.
   The receive function must return a value of the data type itself.
   Usually, a receive function should be declared STRICT; if it is not,
   it will be called with a NULL first parameter when reading a NULL
   input value.  The function must still return NULL in this case, unless
   it raises an error.
   (This case is mainly meant to support domain receive functions, which
   might need to reject NULL inputs.)
   Similarly, the optional
   <replaceable class="parameter">send_function</replaceable> converts
   from the internal representation to the external binary representation.
   If this function is not supplied, the type cannot participate in binary
   output.  The send function must be
   declared as taking one argument of the new data type.
   The send function must return type <type>bytea</type>.
   Send functions are not invoked for NULL values.
  </para>

  <para>
   You should at this point be wondering how the input and output functions
   can be declared to have results or arguments of the new type, when they
   have to be created before the new type can be created.  The answer is that
   the type should first be defined as a <firstterm>shell type</>, which is a
   placeholder type that has no properties except a name and an owner.  This
   is done by issuing the command <literal>CREATE TYPE
   <replaceable>name</></literal>, with no additional parameters.  Then the
   I/O functions can be defined referencing the shell type.  Finally,
   <command>CREATE TYPE</> with a full definition replaces the shell entry
   with a complete, valid type definition, after which the new type can be
   used normally.
  </para>

  <para>
   The optional
   <replaceable class="parameter">type_modifier_input_function</replaceable>
   and <replaceable class="parameter">type_modifier_output_function</replaceable>
   are needed if the type supports modifiers, that is optional constraints
   attached to a type declaration, such as <literal>char(5)</> or
   <literal>numeric(30,2)</>.  <productname>PostgreSQL</productname> allows
   user-defined types to take one or more integer constants as modifiers;
   however, this information must be capable of being packed into a single
   non-negative integer value for storage in the system catalogs.  The
   <replaceable class="parameter">type_modifier_input_function</replaceable>
   is passed the declared modifier(s) in the form of an <type>integer</>
   array.  It must check the values for validity (throwing an error if they
   are wrong), and if they are correct, return a single non-negative
   <type>integer</> value that will be stored as the column <quote>typmod</>.
   Type modifiers will be rejected if the type does not have a
   <replaceable class="parameter">type_modifier_input_function</replaceable>.
   The <replaceable class="parameter">type_modifier_output_function</replaceable>
   converts the internal integer typmod value back to the correct form for
   user display.  It must return a <type>cstring</> value that is the exact
   string to append to the type name; for example <type>numeric</>'s
   function might return <literal>(30,2)</>.
   It is allowed to omit the
   <replaceable class="parameter">type_modifier_output_function</replaceable>,
   in which case the default display format is just the stored typmod value
   enclosed in parentheses.
  </para>

  <para>
   The optional <replaceable class="parameter">analyze_function</replaceable>
   performs type-specific statistics collection for columns of the data type.
   By default, <command>ANALYZE</> will attempt to gather statistics using
   the type's <quote>equals</> and <quote>less-than</> operators, if there
   is a default b-tree operator class for the type.  For non-scalar types
   this behavior is likely to be unsuitable, so it can be overridden by
   specifying a custom analysis function.  The analysis function must be
   declared to take a single argument of type <type>internal</>, and return
   a <type>boolean</> result.  The detailed API for analysis functions appears
   in <filename>src/include/commands/vacuum.h</>.
  </para>

  <para>
   While the details of the new type's internal representation are only
   known to the I/O functions and other functions you create to work with
   the type, there are several properties of the internal representation
   that must be declared to <productname>PostgreSQL</productname>.
   Foremost of these is
   <replaceable class="parameter">internallength</replaceable>.
   Base data types can be fixed-length, in which case
   <replaceable class="parameter">internallength</replaceable> is a
   positive integer, or variable  length, indicated by setting
   <replaceable class="parameter">internallength</replaceable>
   to <literal>VARIABLE</literal>.  (Internally, this is represented
   by setting <literal>typlen</> to -1.)  The internal representation of all
   variable-length types must start with a 4-byte integer giving the total
   length of this value of the type.
  </para>

  <para>
   The optional flag <literal>PASSEDBYVALUE</literal> indicates that
   values of this data type are passed by value, rather than by
   reference.  You cannot pass by value types whose internal
   representation is larger than the size of the <type>Datum</> type
   (4 bytes on most machines, 8 bytes on a few).
  </para>

  <para>
   The <replaceable class="parameter">alignment</replaceable> parameter
   specifies the storage alignment required for the data type.  The
   allowed values equate to alignment on 1, 2, 4, or 8 byte boundaries.
   Note that variable-length types must have an alignment of at least
   4, since they necessarily contain an <type>int4</> as their first component.
  </para>

  <para>
   The <replaceable class="parameter">storage</replaceable> parameter
   allows selection of storage strategies for variable-length data
   types.  (Only <literal>plain</literal> is allowed for fixed-length
   types.)  <literal>plain</literal> specifies that data of the type
   will always be stored in-line and not compressed.
   <literal>extended</literal> specifies that the system will first
   try to compress a long data value, and will move the value out of
   the main table row if it's still too long.
   <literal>external</literal> allows the value to be moved out of the
   main table, but the system will not try to compress it.
   <literal>main</literal> allows compression, but discourages moving
   the value out of the main table.  (Data items with this storage
   strategy might still be moved out of the main table if there is no
   other way to make a row fit, but they will be kept in the main
   table preferentially over <literal>extended</literal> and
   <literal>external</literal> items.)
  </para>

  <para>
   A default value can be specified, in case a user wants columns of the
   data type to default to something other than the null value.
   Specify the default with the <literal>DEFAULT</literal> key word.
   (Such a default can be overridden by an explicit <literal>DEFAULT</literal>
   clause attached to a particular column.)
  </para>

  <para>
   To indicate that a type is an array, specify the type of the array
   elements using the <literal>ELEMENT</> key word.  For example, to
   define an array of 4-byte integers (<type>int4</type>), specify
   <literal>ELEMENT = int4</literal>. More details about array types
   appear below.
  </para>

  <para>
   To indicate the delimiter to be used between values in the external
   representation of arrays of this type, <replaceable
   class="parameter">delimiter</replaceable> can be
   set to a specific character.  The default delimiter is the comma
   (<literal>,</literal>).  Note that the delimiter is associated
   with the array element type, not the array type itself.
  </para>

  </refsect2>

  <refsect2>
   <title>Array Types</title>

   <para>
    Whenever a user-defined type is created,
    <productname>PostgreSQL</productname> automatically creates an
    associated array type, whose name consists of the base type's
    name prepended with an underscore, and truncated if necessary to keep
    it less than <symbol>NAMEDATALEN</symbol> bytes long.  (If the name
    so generated collides with an existing type name, the process is
    repeated until a non-colliding name is found.)
    This implicitly-created array type is variable length and uses the
    built-in input and output functions <literal>array_in</> and
    <literal>array_out</>.  The array type tracks any changes in its
    element type's owner or schema, and is dropped if the element type is.
   </para>

   <para>
    You might reasonably ask why there is an <option>ELEMENT</>
    option, if the system makes the correct array type automatically.
    The only case where it's useful to use <option>ELEMENT</> is when you are
    making a fixed-length type that happens to be internally an array of a number of
    identical things, and you want to allow these things to be accessed
    directly by subscripting, in addition to whatever operations you plan
    to provide for the type as a whole.  For example, type <type>point</>
    is represented as just two floating-point numbers, which it allows to be
    accessed as <literal>point[0]</> and <literal>point[1]</>.
    Note that
    this facility only works for fixed-length types whose internal form
    is exactly a sequence of identical fixed-length fields.  A subscriptable
    variable-length type must have the generalized internal representation
    used by <literal>array_in</> and <literal>array_out</>.
    For historical reasons (i.e., this is clearly wrong but it's far too
    late to change it), subscripting of fixed-length array types starts from
    zero, rather than from one as for variable-length arrays.
   </para>
  </refsect2>
 </refsect1>

 <refsect1>
  <title>Parameters</title>

  <variablelist>
   <varlistentry>
    <term><replaceable class="parameter">name</replaceable></term>
    <listitem>
     <para>
      The name (optionally schema-qualified) of a type to be created.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">attribute_name</replaceable></term>
    <listitem>
     <para>
      The name of an attribute (column) for the composite type.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">data_type</replaceable></term>
    <listitem>
     <para>
      The name of an existing data type to become a column of the
      composite type.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">label</replaceable></term>
    <listitem>
     <para>
      A string literal representing the textual label associated with
      one value of an enum type.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">input_function</replaceable></term>
    <listitem>
     <para>
      The name of a function that converts data from the type's
      external textual form to its internal form.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">output_function</replaceable></term>
    <listitem>
     <para>
      The name of a function that converts data from the type's
      internal form to its external textual form.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">receive_function</replaceable></term>
    <listitem>
     <para>
      The name of a function that converts data from the type's
      external binary form to its internal form.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">send_function</replaceable></term>
    <listitem>
     <para>
      The name of a function that converts data from the type's
      internal form to its external binary form.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">type_modifier_input_function</replaceable></term>
    <listitem>
     <para>
      The name of a function that converts numeric modifier(s) for the type
      into internal form.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">type_modifier_output_function</replaceable></term>
    <listitem>
     <para>
      The name of a function that converts the internal form of the type's
      modifier(s) to external textual form.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">analyze_function</replaceable></term>
    <listitem>
     <para>
      The name of a function that performs statistical analysis for the
      data type.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">internallength</replaceable></term>
    <listitem>
     <para>
      A numeric constant that specifies the length in bytes of the new
      type's internal representation.  The default assumption is that
      it is variable-length.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">alignment</replaceable></term>
    <listitem>
     <para>
      The storage alignment requirement of the data type.  If specified,
      it must be <literal>char</literal>, <literal>int2</literal>,
      <literal>int4</literal>, or <literal>double</literal>; the
      default is <literal>int4</literal>.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">storage</replaceable></term>
    <listitem>
     <para>
      The storage strategy for the data type.  If specified, must be
      <literal>plain</literal>, <literal>external</literal>,
      <literal>extended</literal>, or <literal>main</literal>; the
      default is <literal>plain</literal>.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">default</replaceable></term>
    <listitem>
     <para>
      The default value for the data type.  If this is omitted, the
      default is null.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">element</replaceable></term>
    <listitem>
     <para>
      The type being created is an array; this specifies the type of
      the array elements.
     </para>
    </listitem>
   </varlistentry>

   <varlistentry>
    <term><replaceable class="parameter">delimiter</replaceable></term>
    <listitem>
     <para>
      The delimiter character to be used between values in arrays made
      of this type.
     </para>
    </listitem>
   </varlistentry>
  </variablelist>
 </refsect1>

 <refsect1 id="SQL-CREATETYPE-notes">
  <title>Notes</title>

  <para>
   It is best to avoid using type names that begin with the underscore
   character (<literal>_</literal>).  <productname>PostgreSQL</productname>
   forms the name of an array type by prepending one or more underscores
   to the element type's name, and these names may collide with user-defined
   type names that begin with underscore.  While the system will modify
   generated array type names to avoid collisions, this does not help if the
   conflicting array type already exists when you try to create your type.
   Also, various old client software may assume that names beginning with
   underscores always represent arrays.
  </para>

  <para>
   Because there are no restrictions on use of a data type once it's been
   created, creating a base type is tantamount to granting public execute
   permission on the functions mentioned in the type definition.  (The creator
   of the type is therefore required to own these functions.)  This is usually
   not an issue for the sorts of functions that are useful in a type
   definition.  But you might want to think twice before designing a type
   in a way that would require <quote>secret</> information to be used
   while converting it to or from external form.
  </para>

  <para>
   Before <productname>PostgreSQL</productname> version 8.2, the syntax
   <literal>CREATE TYPE <replaceable>name</></literal> did not exist.
   The way to create a new base type was to create its input function first.
   In this approach, <productname>PostgreSQL</productname> will first see
   the name of the new data type as the return type of the input function.
   The shell type is implicitly created in this situation, and then it
   can be referenced in the definitions of the remaining I/O functions.
   This approach still works, but is deprecated and might be disallowed in
   some future release.  Also, to avoid accidentally cluttering
   the catalogs with shell types as a result of simple typos in function
   definitions, a shell type will only be made this way when the input
   function is written in C.
  </para>

  <para>
   In <productname>PostgreSQL</productname> versions before 7.3, it
   was customary to avoid creating a shell type at all, by replacing the
   functions' forward references to the type name with the placeholder
   pseudotype <type>opaque</>.  The <type>cstring</> arguments and
   results also had to be declared as <type>opaque</> before 7.3.  To
   support loading of old dump files, <command>CREATE TYPE</> will
   accept I/O functions declared using <type>opaque</>, but it will issue
   a notice and change the function declarations to use the correct
   types.
  </para>

 </refsect1>

 <refsect1>
  <title>Examples</title>

  <para>
   This example creates a composite type and uses it in
   a function definition:
<programlisting>
CREATE TYPE compfoo AS (f1 int, f2 text);

CREATE FUNCTION getfoo() RETURNS SETOF compfoo AS $$
    SELECT fooid, fooname FROM foo
$$ LANGUAGE SQL;
</programlisting>
  </para>

  <para>
   This example creates an enumerated type and uses it in
   a table definition:
<programlisting>
CREATE TYPE bug_status AS ENUM ('new', 'open', 'closed');

CREATE TABLE bug (
    serial id,
    description text,
    status bug_status
);
</programlisting>
  </para>

  <para>
   This example creates the base data type <type>box</type> and then uses the
   type in a table definition:
<programlisting>
CREATE TYPE box;

CREATE FUNCTION my_box_in_function(cstring) RETURNS box AS ... ;
CREATE FUNCTION my_box_out_function(box) RETURNS cstring AS ... ;

CREATE TYPE box (
    INTERNALLENGTH = 16,
    INPUT = my_box_in_function,
    OUTPUT = my_box_out_function
);

CREATE TABLE myboxes (
    id integer,
    description box
);
</programlisting>
  </para>

  <para>
   If the internal structure of <type>box</type> were an array of four
   <type>float4</> elements, we might instead use:
<programlisting>
CREATE TYPE box (
    INTERNALLENGTH = 16,
    INPUT = my_box_in_function,
    OUTPUT = my_box_out_function,
    ELEMENT = float4
);
</programlisting>
   which would allow a box value's component numbers to be accessed
   by subscripting.  Otherwise the type behaves the same as before.
  </para>

  <para>
   This example creates a large object type and uses it in
   a table definition:
<programlisting>
CREATE TYPE bigobj (
    INPUT = lo_filein, OUTPUT = lo_fileout,
    INTERNALLENGTH = VARIABLE
);
CREATE TABLE big_objs (
    id integer,
    obj bigobj
);
</programlisting>
  </para>

  <para>
   More examples, including suitable input and output functions, are
   in <xref linkend="xtypes">.
  </para>
 </refsect1>

 <refsect1 id="SQL-CREATETYPE-compatibility">
  <title>Compatibility</title>

  <para>
   This <command>CREATE TYPE</command> command is a
   <productname>PostgreSQL</productname> extension.  There is a
   <command>CREATE TYPE</command> statement in the <acronym>SQL</> standard
   that is rather different in detail.
  </para>
 </refsect1>

 <refsect1 id="SQL-CREATETYPE-see-also">
  <title>See Also</title>

  <simplelist type="inline">
   <member><xref linkend="sql-createfunction" endterm="sql-createfunction-title"></member>
   <member><xref linkend="sql-droptype" endterm="sql-droptype-title"></member>
   <member><xref linkend="sql-altertype" endterm="sql-altertype-title"></member>
   <member><xref linkend="sql-createdomain" endterm="sql-createdomain-title"></member>
  </simplelist>
 </refsect1>

</refentry>