CREATE FUNCTION

Define a new function.

Summary

CREATE [OR REPLACE] FUNCTION name    
    ( [ [argmode] [argname] argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
      [ RETURNS rettype 
        | RETURNS TABLE ( column_name column_type [, ...] ) ]
    { LANGUAGE langname
    | WINDOW
    | IMMUTABLE | STABLE | VOLATILE | [NOT] LEAKPROOF
    | CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT
    | NO SQL | CONTAINS SQL | READS SQL DATA | MODIFIES SQL
    | [EXTERNAL] SECURITY INVOKER | [EXTERNAL] SECURITY DEFINER
    | EXECUTE ON { ANY | MASTER | ALL SEGMENTS | INITPLAN }
    | COST execution_cost
    | SET configuration_parameter { TO value | = value | FROM CURRENT }
    | AS 'definition'
    | AS 'obj_file', 'link_symbol' } ...
    [ WITH ({ DESCRIBE = describe_function
           } [, ...] ) ]

Description

CREATE FUNCTION defines a new function. CREATE OR REPLACE FUNCTION either creates a new function or replaces an existing definition.

The name of the new function must not match any existing function with the same input parameter type in the same schema. However, functions of different parameter types may share a name (overload).

To update the definition of an existing function, use CREATE OR REPLACE FUNCTION. You cannot change the name or parameter type of a function in this way (this will actually create a new, different function). Similarly, CREATE OR REPLACE FUNCTION will not allow you to change the return type of an existing function. To do this, the function must be deleted and recreated. When using an OUT parameter, this means that you cannot change the type of any OUT parameter unless the function is deleted. If you delete and then recreate the function, you must delete the existing objects (rules, views, triggers, etc.) that reference the old function. Use CREATE OR REPLACE FUNCTION to change the function definition without destroying the objects that reference the function.

The user who creates the function will become the owner of the function.

In order to be able to create functions, you must have USAGE privileges for the parameter type and return type.

Restrict the use of VOLATILE and STABLE functions

To prevent data from becoming out of sync between various segments of the YMatrix database, if any function classified as STABLE or VOLATILE contains SQL or modifies the database in any way, the function cannot be executed at the segment level. For example, functions such as random() or timeofday() are not allowed to be executed on distributed data in YMatrix databases, as they may cause data inconsistencies between segment instances.

To ensure data consistency, VOLATILE and STABLE functions can be safely used in statements that are evaluated and executed on the master database. For example, the following statements are always executed on the master database (statements without a FROM clause):

SELECT setval('myseq', 201);
SELECT foo();

If the FROM clause of a statement contains a distributed table, and the functions used in the FROM clause return only a set of rows, you can execute on these segments:

SELECT * FROM foo();

An exception to this rule is functions that return table references (rangeFuncs) or functions that use the refCursor data type. Note that you cannot return a refcursor from any type of function in YMatrix databases.

Function Volatility and EXECUTE ON Attribute

The volatility attributes (IMMUTABLE, STABLE, VOLATILE) and the EXECUTE ON attribute specify two distinct aspects of when and where a function is executed. Typically, volatility indicates when the function is executed, while EXECUTE ON indicates where it is executed.

For example, functions defined with the IMMUTABLE attribute can be executed at query planning time, whereas functions with the VOLATILE attribute must be executed for each row in the query. Functions with the EXECUTE ON MASTER attribute are executed only on the master, while those with the EXECUTE ON ALL SEGMENTS attribute are executed only on all primary instances (not the master).

Functions and Replicated Tables

User-defined functions that only execute SELECT commands on replicated tables can run on segments. Replicated tables created with the DISTRIBUTED REPLICATED clause store all their rows on each segment. It is safe for functions to read them on segments, but updates to replicated tables must be executed on the master instance.

Parameters

  • name: The name of the function to be created (can be schema-qualified).

  • argmode: The mode of the parameter: IN, OUT, INOUT, or VARIADIC. If omitted, the default is IN. Only OUT parameters can follow a parameter declared as VARIADIC. Similarly, OUT and INOUT parameters cannot be used with the RETURNS TABLE notation.

  • argname: The name of the parameter. Some languages (currently only SQL and PL/pgSQL) allow you to reference the parameter name within the function body. For other languages, the input parameter names are just additional documentation for the function itself, but you can use them in function calls to improve readability. However, output parameter names are significant because they define the column names in the result row type. (If an output parameter name is omitted, the system will generate a default column name.)

  • argtype: The data type of the function parameter (can be schema-qualified). The parameter type can be a base type, composite type, domain type, or a reference to the type of a table column. Depending on the implementation language, you may also be able to specify pseudo-types such as cstring. Pseudo-types indicate that the actual parameter type is not fully specified or lies outside the set of ordinary SQL data types.

  • default_expr: An expression to be used as the default value for the parameter if no value is specified. The expression must be coercible to the parameter's data type. Only IN and INOUT parameters can have default values. Each subsequent input parameter in the parameter list must also have a default value if any parameter before it has one.

  • rettype: The return data type (can be schema-qualified). The return type can be a base type, composite type, domain type, or a reference to the type of a table column. Depending on the implementation language, you may also be able to specify pseudo-types such as cstring. If the function is not supposed to return a value, specify void as the return type. When there are OUT or INOUT parameters, the RETURNS clause can be omitted. If present, it must agree with the result type implied by the output parameters: RECORD if there are multiple output parameters, or the same type as the single output parameter. The SETOF modifier indicates that the function will return a set of items rather than a single item. Reference the type of a column using tablename.columnname%TYPE.

  • column_name: The name of an output column in the RETURNS TABLE syntax. This is actually an alternative way of declaring named OUT parameters, except that RETURNS TABLE also implies RETURNS SETOF.

  • column_type: The data type of an output column in the RETURNS TABLE syntax.

  • langname: The name of the language in which the function is written. This can be SQL, C, internal, or the name of a user-defined procedural language.

  • WINDOW: WINDOW indicates that the function is a window function rather than a regular function. Currently, this is only useful for functions written in C. The WINDOW attribute cannot be changed when replacing an existing function definition.

  • IMMUTABLE: Indicates that the function cannot modify the database and always returns the same result when given the same parameter values. It does not perform database lookups or use information not directly present in its parameter list. If this option is specified, any function call with fully constant arguments can be immediately replaced with the function value.

  • STABLE: Indicates that the function cannot modify the database and will consistently return the same result for the same parameter values within a single table scan, but its results may change across SQL statements. This is appropriate for functions whose results depend on database lookups, parameter values (e.g., current time zone), etc. Note that the current_timestamp family of functions qualifies as stable since their values do not change during a transaction.

  • VOLATILE: Indicates that the function value may change even within a single table scan, so no optimizations can be made. Relatively few database functions are truly volatile. Examples include random() and timeofday(). However, any function with side effects must be classified as volatile, even if its result is predictable, to prevent the call from being optimized away. An example is setval().

  • LEAKPROOF: Indicates that the function has no side effects. Besides its return value, it does not reveal any information about its parameters. Functions that throw error messages for some parameter values but not others, or that include parameter values in any error messages, are not leakproof. The query optimizer can push leakproof functions (but not others) into views created with the security_barrier option. See CREATE VIEW and CREATE RULE. This option can only be set by superusers.

  • CALLED ON NULL INPUT: (Default) The function will be normally called when some of its parameters are null. The function author is responsible for checking for nulls (if necessary) and responding appropriately.

  • RETURNS NULL ON NULL INPUT or STRICT: Indicates that the function always returns null whenever any of its parameters are null. If this option is specified, the function is not executed when null arguments are supplied; instead, a null result is assumed automatically.

  • NO SQL, CONTAINS SQL, READS SQL DATA, MODIFIES SQL DATA: These attributes inform the query optimizer whether the function contains SQL statements and, if so, whether those statements read and/or write data.

  • [EXTERNAL] SECURITY INVOKER: (Default) The function will execute with the privileges of the user who calls the function. SECURITY DEFINER specifies that the function is to be executed with the privileges of the user who created the function. The EXTERNAL keyword is allowed for SQL conformance but is optional, as this capability applies not only to external functions but to all functions.

  • EXECUTE ON ANY, EXECUTE ON MASTER, EXECUTE ON ALL SEGMENTS, EXECUTE ON INITPLAN: The EXECUTE ON attribute specifies where (on the master or segment instances) the function is to be executed when called during query execution.

  • COST execution_cost: A positive number indicating the estimated execution cost of the function, in units of cpu_operator_cost. If the function returns a set, execution_cost indicates the cost per returned row. If not specified, C-language and internal functions default to 1 unit, while functions in other languages default to 100 units. Specifying a larger execution_cost value will cause the optimizer to evaluate the function less frequently.

  • SET configuration_parameter { TO value | = value | FROM CURRENT }: When entering the function, the SET clause applies a value to the session configuration parameter. When the function exits, the configuration parameter is restored to its previous value. SET FROM CURRENT saves the current parameter value at the time of CREATE FUNCTION execution as the value to be applied when entering the function.

  • AS 'definition': Defines the function as a string constant; the meaning depends on the language. It can be the internal function name, the path to the target file, the text of an SQL command, or procedural language.

  • obj_file, link_symbol: Used for dynamically loadable C-language functions when the function name in the C source code differs from the SQL function name. The string obj_file is the name of the file containing the dynamically loadable object, and link_symbol is the name of the function in the C source code. If the link symbol is omitted, it is assumed to be the same as the SQL function name. All functions must have distinct C names, so overloaded SQL functions must be given distinct C names (e.g., using the parameter type as part of the C name). It is advisable to locate shared libraries relative to $libdir (located in $GPHOME/lib) or via the dynamic library path (set by the dynamic_library_path server configuration parameter).

  • describe_function: Resolves the name of the callback function to be executed when the query that calls this function. The callback function returns a tuple descriptor representing the result type.

Note

All compiled code (shared library files) used for custom functions must be placed in the same location on each host in the YMatrix database array (master and all segments). This location must also be in LD_LIBRARY_PATH so that the server can find the file. It is recommended to locate shared libraries on all master instances in the YMatrix array relative to $libdir (located in $GPHOME/lib) or via the dynamic library path (set by dynamic_library_path server configuration parameters).

The input parameters and return values allow the use of full SQL type syntax. However, some details of the type specification (e.g., the precision field of type numeric) are the responsibility of the underlying function implementation, and the CREATE FUNCTION command does not recognize or enforce them.

YMatrix allows function overloading. The same name can be used for multiple different functions as long as they have different input parameter types. However, all functions must have distinct C names, so overloaded C functions must be given distinct C names (e.g., using the parameter type as part of the C name).

If two functions have the same name and input parameter type, and any OUT parameters are ignored, they are considered the same. For example, these statements conflict:

CREATE FUNCTION foo(int) ...
CREATE FUNCTION foo(int, out text) ...

Functions with different parameter type lists are not considered conflicting when created, but if parameter default values are provided, they may conflict in use. For example:

CREATE FUNCTION foo(int) ...
CREATE FUNCTION foo(int, int default 42) ...

Since it is not clear which function should be called, executing foo(10) will fail.

When a duplicate CREATE FUNCTION call refers to the same target file, the file is loaded only once. To uninstall and reload the file, use the LOAD command.

You must have USAGE privileges for a language to define functions using that language.

Writing a function to define a string using dollar quotes instead of regular single quote syntax is often helpful. If you do not use dollar quotes, you must escape any single quotes or backslashes in the function definition by doubling them. A string constant referenced with a dollar consists of a dollar sign ($), an optional token of zero or more characters, another dollar sign, any sequence of characters that constitutes the content of the string, a dollar sign, and the same token as the starting dollar reference and a dollar sign. In dollar-quoted strings, single quotes, backslashes, or any other character can be used without escaping. The content of the string is always taken in its original sense. For example, the following are two ways to specify the string "Dianne's horse" using dollar quotes:

$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$

If the SET clause is attached to a function, the SET LOCAL command executed for the same variable inside the function is limited to that function; when the function exits, the previous value of the configuration parameter is still restored. However, a normal SET command (not including LOCAL) overrides the CREATE FUNCTION SET clause, just as previous SET LOCAL commands do. Unless the current transaction rolls back, the effect of the command persists after the function exits.

If a function with VARIADIC parameters is declared as STRICT, the strictness check will test whether the variadic array is non-empty as a whole. If the array has an empty element, PL/pgSQL will still call the function.

When replacing an existing function with CREATE OR REPLACE FUNCTION, there are restrictions on changing the parameter name. You cannot change the name assigned to any input parameters (though you can add a name to a parameter that was not previously named). If there are multiple output parameters, the name of the output parameter cannot be changed, as this would alter the column name of the anonymous composite type describing the function's result. These restrictions are in place to ensure that existing calls to the function do not break when replaced.

Using Functions with Queries for Distributed Data

In some cases, if the data in the table specified in the FROM clause is distributed on YMatrix database segments, YMatrix does not support the use of functions in queries. For example, consider this SQL query containing the function func():

SELECT func(a) FROM table1;

This function is not supported in the query if all of the following conditions are met:

  • The data of table1 is distributed on YMatrix database segments.
  • Function func() reads from or modifies data in distributed tables.
  • Function func() returns multiple rows or takes arguments from table1 (a).

If none of the conditions are met, the function is supported. Specifically, the function is supported if any of the following conditions are met:

  • The function func() does not access data in distributed tables and only accesses data on the YMatrix database master.
  • Table table1 is only on the master.
  • The function func() returns only one row and only accepts constant input parameters. This function is supported if it can be modified to require no input parameters.

Using the EXECUTE ON Property

Most functions that perform queries to access tables can only be executed on the master. However, functions that only perform SELECT queries on replicated tables can run on segments. If the function accesses a hash-distributed table or a randomly distributed table, the function should be defined using the EXECUTE ON MASTER property. Otherwise, when using the function in a complex query, it may return incorrect results. Without this property, optimizer optimizations may determine that pushing a function call to a segment instance would be beneficial.

Here are the limitations of functions defined with the EXECUTE ON MASTER or EXECUTE ON ALL SEGMENTS attribute:

  • The function must be a function that returns a set.
  • The function cannot be in the FROM clause of the query.
  • The function cannot be in the SELECT list of queries with FROM clauses.
  • Queries containing this function are returned from GPORCA to the Postgres query planner.

The EXECUTE ON INITPLAN property indicates that the function contains a SQL command that dispatches queries to segment instances and requires special processing on the master. YMatrix will process functions on the master in the following ways:

  • First, YMatrix runs the function as part of the InitPlan node on the master and temporarily saves the function output.
  • Then, in the MainPlan of the query plan, the function is called in EntryDB (a special query executor (QE) running on the master), and YMatrix returns the function as the initialization plan node. This function does not run in MainPlan.

In this simple example, the function get_data() is used in a CTAS command to create a table using data from the country table. This function contains a SELECT command that retrieves data from the country table and uses the EXECUTE ON INITPLAN attribute:

CREATE TABLE country( 
  c_id integer, c_name text, region int) 
  DISTRIBUTED RANDOMLY;

INSERT INTO country VALUES (11,'INDIA', 1 ), (22,'CANADA', 2), (33,'USA', 3);

CREATE OR REPLACE FUNCTION get_data()
  RETURNS TABLE (
   c_id integer, c_name text
   )
AS $$
  SELECT
    c.c_id, c.c_name
  FROM
    country c;
$$
LANGUAGE SQL EXECUTE ON INITPLAN;

CREATE TABLE t AS SELECT * FROM get_data() DISTRIBUTED RANDOMLY;

If you use EXPLAIN ANALYZE VERBOSE to view the query plan for the CTAS command, the plan shows that the function is executed as part of the InitPlan node and one of the listed slices is marked as entry db. A query plan for a simple CTAS command without the function does not have an InitPlan node or entrydb slice.

If the function does not have the EXECUTE ON INITPLAN attribute, the CTAS command returns an error indicating that the function cannot be executed on the QE slice.

When a function uses the EXECUTE ON INITPLAN attribute, use the function's command, such as CREATE TABLE t AS SELECT * FROM get_data(), to collect the function's result on the master and then redistribute the result to the segment instances when the data is inserted. If the function returns a large amount of data, the master may become a bottleneck in collecting and redistributing the data. Performance may improve if you rewrite the function to run the CTAS command within a user-defined function and use the table name as an input parameter. In this example, the function runs the CTAS command and does not require the EXECUTE ON INITPLAN attribute. Executing the SELECT command creates table t1 using the function that runs the CTAS command.

CREATE OR REPLACE FUNCTION my_ctas(_tbl text) RETURNS VOID AS
$$
BEGIN
  EXECUTE format('CREATE TABLE %s AS SELECT c.c_id, c.c_name FROM country c DISTRIBUTED RANDOMLY', _tbl);
END
$$
LANGUAGE plpgsql;

SELECT my_ctas('t1');

Example

A very simple addition function:

CREATE FUNCTION add(integer, integer) RETURNS integer
   AS 'select $1 + $2;'
   LANGUAGE SQL
   IMMUTABLE
   RETURNS NULL ON NULL INPUT;

Increment an integer using the parameter name in PL/pgSQL:

CREATE OR REPLACE FUNCTION increment(i integer) RETURNS
integer AS $$
        BEGIN
                RETURN i + 1;
        END;
$$ LANGUAGE plpgsql;

Add the default segment host memory for each query to the PL/pgSQL function:

CREATE OR REPLACE FUNCTION function_with_query() RETURNS
SETOF text AS $$
        BEGIN
                RETURN QUERY
                EXPLAIN ANALYZE SELECT * FROM large_table;
        END;
$$ LANGUAGE plpgsql
SET statement_mem='256MB';

Return an ENUM array using polymorphic types:

CREATE TYPE rainbow AS ENUM('red','orange','yellow','green','blue','indigo','violet');
CREATE FUNCTION return_enum_as_array( anyenum, anyelement, anyelement )
    RETURNS TABLE (ae anyenum, aa anyarray) AS $$
    SELECT $1, array[$2, $3]
$$ LANGUAGE SQL STABLE;

SELECT * FROM return_enum_as_array('red'::rainbow, 'green'::rainbow, 'blue'::rainbow);

Returns a record containing multiple output parameters:

CREATE FUNCTION dup(in int, out f1 int, out f2 text)
   AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
   LANGUAGE SQL;

SELECT * FROM dup(42);

You can use explicitly named composite types to do the same in more detail:

CREATE TYPE dup_result AS (f1 int, f2 text);
CREATE FUNCTION dup(int) RETURNS dup_result
    AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
    LANGUAGE SQL;

SELECT * FROM dup(42);

Another way to return multiple columns is to use the TABLE function:

CREATE FUNCTION dup(int) RETURNS TABLE(f1 int, f2 text)
   AS $$ SELECT $1, CAST($1 AS text) || ' is text' $$
   LANGUAGE SQL;

SELECT * FROM dup(4);

The function is defined with EXECUTE ON ALL SEGMENTS to run on all primary instances. The SELECT command executes a function that returns the runtime on each segment instance:

CREATE FUNCTION run_on_segs (text) returns setof text as $$
  begin
    return next ($1 || ' - ' || now()::text );
  end;
$$ language plpgsql VOLATILE EXECUTE ON ALL SEGMENTS;

SELECT run_on_segs('my test');

This function looks for part names in the parts table. The parts table is a replicated table, so the function can be executed on the master or primary:

CREATE OR REPLACE FUNCTION get_part_name(partno int) RETURNS text AS
$$
DECLARE
   result text := ' ';
BEGIN
    SELECT part_name INTO result FROM parts WHERE part_id = partno;
    RETURN result;
END;
$$ LANGUAGE plpgsql;

If you execute SELECT get_part_name(100); on the master, the function is executed on the master. (The master instance directs the query to a single primary.) If the orders table is a distributed table, and you execute the following query, the get_part_name() function will be executed on the primary:

SELECT order_id, get_part_name(orders.part_no) FROM orders;

Compatibility

CREATE FUNCTION is defined in SQL:1999 and later. The YMatrix version is similar but not fully compatible. These properties are not portable, nor are they available in different languages.

For compatibility with some other database systems, argmode can be written before or after argname. However, only the first method conforms to the standards.

For parameter default values, the SQL standard specifies only the syntax using the DEFAULT keyword. The = syntax is used in T-SQL and Firebird.

See also

ALTER FUNCTION, DROP FUNCTION