Reference
1 Platforms
opencl.get_platforms()-
Returns a sequence with the available platforms.
platform:get_info(name)-
Queries information about the platform.
namecan be any of the following, in which case the function returns:- “profile”: a string, in which case the platform
supports
- “FULL_PROFILE”: the full OpenCL specification,
- “EMBEDDED_PROFILE”: a subset of the OpenCL specification;
- “version”: a string with the OpenCL version;
- “name”: a string with the platform name;
- “vendor”: a string with the platform vendor;
- “extensions”: a string with a space-separated list of supported extensions.
- “profile”: a string, in which case the platform
supports
2 Devices
platform:get_devices([device_type])-
Returns a sequence of devices available on the platform.
device_typecan be one or a sequence of the following:- “cpu”: a host processor;
- “gpu”: a graphics processor;
- “accelerator”: a dedicated accelerator;
- “default”: the default device(s) of the platform, of any of the above types;
- “custom” (OpenCL 1.2): a dedicated accelerator that does not support OpenCL C kernels.
If
device_typeis omitted, the function returns all devices not of type “custom”. device:create_sub_devices(name, value)-
Partitions the compute units of the device and returns a sequence of sub-devices.
namespecifies the partition type, wherevaluespecifies- “equally”: the number of compute units per partition;
- “by_counts”: a sequence with the number of compute units of each partition;
- “by_affinity_domain”: the cache hierarchy shared by
the compute units of each partition:
- “numa”: a NUMA node,
- “l4_cache”: a level 4 cache,
- “l3_cache”: a level 3 cache,
- “l2_cache”: a level 2 cache,
- “l1_cache”: a level 1 cache,
- “next_partitionable”: the first cache level along which the device may be partitioned.
This function is available with OpenCL 1.2.
device:get_info(name)-
Queries information about the device.
namecan be any of the following, in which case the function returns:- “address_bits”: the size of the device address space in bits (32 or 64);
- “available”: whether the device is available (true or false);
- “built_in_kernels” (OpenCL 1.2): a semi-colon separated list of built-in kernels;
- “compiler_available”: whether an OpenCL C compiler is available (true or false);
- “double_fp_config” (OpenCL 1.2): a table
with double precision floating-point capabilities (true
or nil):
- denorm: denormal numbers,
- inf_nan: infinity and not-a-number,
- round_to_nearest: round to nearest even number,
- round_to_zero: round to zero,
- round_to_inf: round towards positive and negative infinity,
- fma: fused multiply-add,
- soft_float: software floating-point implementation;
- “endian_little”: whether the device has little (true) or big (false) endianness;
- “error_correction_support”: whether the device corrects memory errors (true or false);
- “execution_capabilities”: a table with execution
capabilities (true or nil):
- kernel: the device can execute kernel functions,
- native_kernel: the device can execute host functions;
- “extensions”: a space separated list of extension names;
- “global_mem_cache_size”: the size of the global memory cache in bytes;
- “global_mem_cache_type”: the type of the global
memory cache:
- “read_only”: read-only,
- “read_write”: read-write cache;
- “global_mem_cacheline_size”: the size of a global memory cache line in bytes;
- “global_mem_size”: the size of the global device memory in bytes;
- “half_fp_config” (OpenCL 1.1): a table
with half precision floating-point capabilities (true
or nil):
- denorm: denormal numbers,
- inf_nan: infinity and not-a-number,
- round_to_nearest: round to nearest even number,
- round_to_zero: round to zero,
- round_to_inf: round towards positive and negative infinity,
- fma: fused multiply-add,
- soft_float: software floating-point implementation;
- “host_unified_memory” (OpenCL 1.1): whether device and host have a unified memory subsystem (true or false);
- “image_support”: whether the device supports images (true or false);
- “image2d_max_height”: maximum height of a 2D image in pixels;
- “image2d_max_width”: maximum width of a 2D image, or a 1D image not created from a buffer, in pixels;
- “image3d_max_depth”: maximum depth of a 3D image in pixels;
- “image3d_max_height”: maximum height of a 3D image in pixels;
- “image3d_max_width”: maximum width of a 3D image in pixels;
- “image_max_buffer_size” (OpenCL 1.2): maximum number of pixels in a 1D image created from a buffer;
- “image_max_array_size” (OpenCL 1.2): maximum number of images in a 1D or 2D image array;
- “linker_available” (OpenCL 1.2): whether a linker is available (true or false);
- “local_mem_size”: the size of the local memory in bytes;
- “local_mem_type”: the type of the local memory:
- “local”: dedicated local memory storage,
- “global”: global device memory;
- “max_clock_frequency”: the maximum clock frequency of the device in MHz;
- “max_compute_units”: the maximum number of compute units on the device;
- “max_constant_args”: the maximum number of kernel
arguments with a
__constantqualifier; - “max_constant_buffer_size”: the maximum size of a constant buffer in bytes;
- “max_mem_alloc_size”: the maximum size of a memory buffer;
- “max_parameter_size”: the maximum size in bytes of the arguments passed to a kernel function;
- “max_read_image_args”: the maximum number of images that can be read by a kernel function;
- “max_samplers”: the maximum number of samples that can be used by a kernel function;
- “max_work_group_size”: the maximum number of work-items in a work-group;
- “max_work_item_dimensions”: the maximum number of dimensions of the global and local work-item IDs;
- “max_work_item_sizes”: a sequence with the maximum number of work-items for each dimension;
- “max_write_image_args”: the maximum number of images that can be written to by a kernel function;
- “mem_base_addr_align”: the alignment in bytes of the starting address of a buffer;
- “min_data_type_align_size”: the smallest alignment in bytes for any data type;
- “name”: the device name;
- “native_vector_width_char” (OpenCL 1.1):
native number of scalar elements in a
charvector; - “native_vector_width_short” (OpenCL 1.1):
native number of scalar elements in a
shortvector; - “native_vector_width_int” (OpenCL 1.1):
native number of scalar elements in a
intvector; - “native_vector_width_long” (OpenCL 1.1):
native number of scalar elements in a
longvector; - “native_vector_width_float” (OpenCL 1.1):
native number of scalar elements in a
floatvector; - “native_vector_width_double” (OpenCL 1.1):
native number of scalar elements in a
doublevector; - “native_vector_width_half” (OpenCL 1.1):
native number of scalar elements in a
halfvector; - “opencl_c_version” (OpenCL 1.1): a string with the supported OpenCL C version;
- “parent_device” (OpenCL 1.2): the parent device of a sub-device;
- “partition_max_sub_devices” (OpenCL 1.2): the maximum number of sub-devices;
- “partition_properties” (OpenCL 1.2): a
table with the supported partition types (true or
nil):
- equally: partitions of equal size,
- by_counts: partitions of different sizes,
- by_affinity_domain: partitions by affinity domains;
- “partition_affinity_domain” (OpenCL 1.2):
a table with the supported affinity domains (true or
nil):
- numa: partition by NUMA node,
- l4_cache: partition by L4 cache,
- l3_cache: partition by L3 cache,
- l2_cache: partition by L2 cache,
- l1_cache: partition by L1 cache,
- next_partitionable: partition by next partitionable affinity domain;
- “partition_type” (OpenCL 1.2): the partition property and the partition property value of the sub-device;
- “platform”: the platform;
- “preferred_vector_width_char”: preferred number of
scalar elements in a
charvector; - “preferred_vector_width_short”: preferred number of
scalar elements in a
shortvector; - “preferred_vector_width_int”: preferred number of
scalar elements in a
intvector; - “preferred_vector_width_long”: preferred number of
scalar elements in a
longvector; - “preferred_vector_width_float”: preferred number of
scalar elements in a
floatvector; - “preferred_vector_width_double”: preferred number
of scalar elements in a
doublevector; - “preferred_vector_width_half”: preferred number of
scalar elements in a
halfvector; - “printf_buffer_size” (OpenCL 1.2): maximum size in bytes of the buffer that holds the output of printf;
- “preferred_interop_user_sync” (OpenCL 1.2): whether user synchronization is preferred (true or false);
- “profile”: a string, in which case the device
supports
- “FULL_PROFILE”: the full OpenCL specification,
- “EMBEDDED_PROFILE”: a subset of the OpenCL specification;
- “profiling_timer_resolution”: the resolution of the device timer in nanoseconds;
- “queue_properties”: a table with the supported
command-queue properties (true or
nil):
- out_of_order_exec_mode: out-of-order execution,
- profiling: execution profiling;
- “single_fp_config”: a table with single precision
floating-point capabilities (true or
nil):
- denorm: denormal numbers,
- inf_nan: infinity and not-a-number,
- round_to_nearest: round to nearest even number,
- round_to_zero: round to zero,
- round_to_inf: round towards positive and negative infinity,
- fma: fused multiply-add,
- soft_float (OpenCL 1.1): software floating-point implementation;
- “type”: a table with the device types
(true or nil):
- cpu: a host processor,
- gpu: a graphics processor,
- accelerator: a dedicated accelerator,
- default: a default device of the platform,
- custom: a dedicated accelerator that does not support OpenCL C kernels;
- “vendor”: a string with the vendor name;
- “vendor_id”: a number that uniquely identifies the vendor;
- “version”: a string with the supported OpenCL version.
- “driver_version”: a string with the OpenCL driver version;
3 Contexts
opencl.create_context(devices)-
Creates and returns a context given a sequence of devices.
context:get_info(name)-
Queries information about the context.
namecan be any of the following, in which case the function returns:- “num_devices” (OpenCL 1.1): the number of devices in the context;
- “devices”: a sequence with the devices in the context.
4 Memory objects
context:create_buffer([flags, ]size[, host_ptr])-
Creates and returns a buffer object of the given size in bytes.
flagscan be one or a sequence of the following:- “read_write”: the buffer will be read and written by a kernel (the default);
- “write_only”: the buffer will be written but not read by a kernel;
- “read_only”: the buffer will be read but not written by a kernel;
- “use_host_ptr”: the buffer should be stored in the
memory given by
host_ptr; - “alloc_host_ptr”: the buffer should be allocated in host-accessible memory;
- “copy_host_ptr”: initialize the buffer by copying
data from
host_ptr; - “host_write_only” (OpenCL 1.2): the buffer will be written but not read by the host;
- “host_read_only” (OpenCL 1.2): the buffer will be read but not written by the host;
- “host_no_access” (OpenCL 1.2): the buffer will be neither read nor written by the host.
mem:create_sub_buffer([flags, ]create_type, create_info)-
Creates and returns a buffer object from an existing buffer object.
create_typecan be one of the following:“region”: creates a sub-buffer from a region of the given buffer.
create_infospecifies a sequence with the origin and size of the region in bytes.
This function is available with OpenCL 1.1.
mem:get_info(name)-
Queries information about the memory object.
namecan be any of the following, in which case the function returns:- “type”: the type of the memory object:
- “buffer”: a memory buffer,
- “image1d”: a 1D image,
- “image1d_buffer”: a 1D image created from a buffer,
- “image1d_array”: a 1D array image,
- “image2d”: a 2D image buffer,
- “image2d_array”: a 2D array image,
- “image3d”: a 3D image buffer;
- “flags”: a table with the flags specified when the memory object was created (true or nil);
- “size”: the size of the memory object in bytes;
- “host_ptr”: the host pointer specified when the memory object was created;
- “map_count”: the number of mapped memory regions;
- “context”: the context of the memory object;
- “associated_memobject” (OpenCL 1.1): the memory object associated with a sub-buffer;
- “offset” (OpenCL 1.1): the offset in bytes of a sub-buffer.
- “type”: the type of the memory object:
5 Programs
context:create_program_with_source(source)-
Creates and returns a program from source code.
sourcespecifies a string that contains the OpenCL C source code. program:build([[devices, ]options])-
Compiles and links an executable from program source or binary. An optional argument
devicesspecifies a sequence of devices that the program is built for; otherwise, ifdevicesis omitted, the program is built for all devices in the context. A second, optional argumentoptionsspecifies a string that contains compiler options. program:get_info(name)-
Queries information about the program.
namecan be any of the following, in which case the function returns:- “context”: the context of the program;
- “num_devices”: the number of devices associated with the program;
- “devices”: a sequence with the devices associated with the program;
- “source”: a string with the source code of the program.
- “binary_sizes”: a sequence with the sizes in bytes of the program binaries;
- “binaries”: a sequence of strings with the programs binaries;
- “num_kernels” (OpenCL 1.2): the number of kernels in the program;
- “kernel_names” (OpenCL 1.2): a string with a semicolon-separated list of kernel names;
program:get_build_info(device, name)-
Query build information for a device about the program.
namecan be any of the following, in which case the function returns:- “status”: the build status:
- “error”: the build failed,
- “success”: the build succeeded,
- “in_progress”: the build has not finished;
- “options”: a string with the compiler output;
- “log”: a string with the compiler options;
- “binary_type” (OpenCL 1.2): the program
binary type:
- “compiled_object”: the program has been compiled to an object file,
- “library”: the program has been linked into a library,
- “executable”: the program has been linked into an executable.
- “status”: the build status:
6 Kernels
program:create_kernel(name)-
Returns a kernel object for the kernel with the given name.
program:create_kernels_in_program()-
Returns a sequence of kernel objects in the program.
kernel:set_arg(index, [size, ]value)-
Sets the value of the kernel argument at the given 0-based
index.For
__globalor__constantarguments,valueis a memory object, or nil.For arguments of type
sampler_t,valuespecifies a sampler object.For
__localarguments,sizespecifies the amount of local memory in bytes.For other arguments,
valueis a non-scalar C data object that containssizebytes. kernel:get_info(name)-
Queries information about the kernel.
namecan be any of the following, in which case the function returns:- “function_name”: the name of the kernel function;
- “num_args”: the number of kernel arguments;
- “context”: the context;
- “program”: the program;
- “attributes” (OpenCL 1.2): a string with a space-separated list of kernel attributes.
kernel:get_arg_info(index, name)-
Queries information about the kernel argument at the given 0-based
index.namecan be any of the following, in which case the function returns:- “address_qualifier”: the address qualifier
of the argument type:
- “global”: global memory,
- “local”: local memory,
- “constant”: constant memory,
- “private”: private memory (the default);
- “access_qualifier”: the access
qualifier of the argument type:
- “read_only”: read-only image,
- “write_only”: write-only image,
- “read_write”: read-write image;
- “type_name”: the name of the argument type without qualifiers;
- “type_qualifier”: a table with the argument type
qualifiers (true or nil):
- “const”: constant type,
- “restrict”: restricted pointer type,
- “volatile”: volatile pointer type;
- “name”: the name of the argument.
This function is available with OpenCL 1.2.
- “address_qualifier”: the address qualifier
of the argument type:
kernel:get_work_group_info([device, ]name)-
Query work group information for a device about the kernel.
devicemay be omitted if the kernel is associated with a single device.namecan be any of the following, in which case the function returns:- “global_work_size” (OpenCL 1.2): a sequence with the maximum global sizes, for a custom device;
- “work_group_size”: the maximum work-group size;
- “compile_work_group_size”: a sequence with the required work-group sizes;
- “local_mem_size”: the amount of local memory in bytes used by a work group;
- “preferred_work_group_size_multiple” (OpenCL 1.1): the preferred multiple of the work-group size;
- “private_mem_size” (OpenCL 1.1): the amount of private memory used by a work item.
7 Command Queues
context:create_command_queue(device[, properties])-
Creates and returns a command queue on the given device.
propertiescan be one or a sequence of the following:- “out_of_order_exec_mode”: enable out-of-order execution of commands;
- “profiling”: enable profiling of commands.
queue:get_info(name)-
Queries information about the command queue.
namecan be any of the following, in which case the function returns:- “context”: the context;
- “device”: the device;
- “properties”: a table with the command-queue properties.
queue:enqueue_ndrange_kernel(kernel, global_offset, global_size[, local_size[, events]])-
Enqueues a command that executes a kernel, and returns an event object for the command.
global_sizeis a sequence that specifies the number of global work items for each dimension.global_offsetis a sequence that specifies the corresponding offsets of the global IDs; or nil, in which case the offsets default to {0, …, 0}.local_sizeis a sequences that specifies the number of work items per work-group for each dimension; or nil, in which case the work-group size is chosen by the implementation.If
eventsis specified, the command is executed after the given events have completed. queue:enqueue_map_buffer(buffer, blocking, flags[, offset, size[, events]])-
Enqueues a command that maps a region from a buffer object into host memory, and returns a pointer to the mapped memory region and an event object for the command. If
blockingis true, the function returns after the data is accessible from host memory; otherwise, if false, the function returns immediately. The command maps the buffer region specified byoffsetandsizein bytes (default values are 0 and the size of the buffer).flagscan be one or a sequence of the following:- “read”: the buffer region is mapped for reading;
- “write”: the buffer region is mapped for writing;
- “write_invalidate_region” (OpenCL 1.2): the buffer region is mapped for writing only.
If
eventsis specified, the command is executed after the given events have completed. queue:enqueue_unmap_mem_object(mem, ptr[, events])-
Enqueues a command that unmaps a mapped memory region for the given memory object, and returns an event object for the command.
If
eventsis specified, the command is executed after the given events have completed. queue:enqueue_read_buffer(buffer, blocking, [offset, size,] ptr[, events])-
Enqueues a command that reads from a buffer object to host memory, and returns an event object for the command. If
blockingis true, the function returns after the data has been written to host memory; otherwise, if false, the function returns immediately. The command reads from the buffer region specified byoffsetandsizein bytes (default values are 0 and the size of the buffer). The command writes to the host memory region specified byptr.If
eventsis specified, the command is executed after the given events have completed. queue:enqueue_write_buffer(buffer, blocking, [offset, size,] ptr[, events])-
Enqueues a command that writes to a buffer object from host memory, and returns an event object for the command. If
blockingis true, the function returns after the data has been read from host memory; otherwise, if false, the function returns immediately. The command writes to the buffer region specified byoffsetandsizein bytes (default values are 0 and the size of the buffer). The command reads from the host memory region specified byptr.If
eventsis specified, the command is executed after the given events have completed. queue:enqueue_copy_buffer(src_buffer, dst_buffer[, src_offset, dst_offset, size[, events]])-
Enqueues a command that copies from a source buffer to a destination buffer, and returns an event object for the command. The command copies
sizebytes from the source buffer region specified bysrc_offsetin bytes to the destination buffer region specified bydst_offsetin bytes (default values are 0 forsrc_offsetanddst_offset, and the size of the source buffer forsize).If
eventsis specified, the command is executed after the given events have completed. queue:enqueue_fill_buffer(buffer, pattern, pattern_size[, offset, size[, events]])-
Enqueues a command that fills a buffer with a constant value, and returns an event object for the command. The command reads
pattern_sizebytes from the buffer pointed to bypattern, and fills the buffer region specified byoffsetandsizein bytes (default values are 0 and the size of the buffer), whereoffsetandsizemust be multiples ofpattern_size.If
eventsis specified, the command is executed after the given events have completed.This function is available with OpenCL 1.2.
queue:enqueue_marker_with_wait_list([events])-
Enqueues a command that waits for the given events to complete, and returns an event object for the command. If
eventsis omitted, the command waits for all previously enqueued commands.This function is available with OpenCL 1.2.
queue:enqueue_marker()-
Enqueues a command that waits for all previously enqueued commands to complete, and returns an event object for the command.
This function has been deprecated in OpenCL 1.2.
queue:enqueue_barrier_with_wait_list([events])-
Enqueues a command that waits for the given events to complete, and returns an event object for the command. If
eventsis omitted, the command waits for all previously enqueued commands.The command blocks execution of all subsequently enqueued commands.
This function is available with OpenCL 1.2.
queue:enqueue_barrier()-
Enqueues a command that for all previously enqueued commands to complete.
The command blocks execution of all subsequently enqueued commands.
This function has been deprecated in OpenCL 1.2.
queue:enqueue_wait_for_events(events)-
Enqueues a command that waits for the given events to complete.
The command blocks execution of all subsequently enqueued commands.
This function has been deprecated in OpenCL 1.2.
queue:flush()-
Submits all previously enqueued commands to the device. The function ensures that the commands will eventually be executed, which is useful when the enqueued commands are being waited on by commands in a different queue.
queue:finish()-
Blocks until all previously enqueued commands have completed.
8 Events
opencl.wait_for_events(events)-
Blocks until the commands associated with the given events have completed.
event:get_info(name)-
Queries information about the event.
namecan be any of the following, in which case the function returns:- “command_queue”: the command queue associated with the event;
- “context” (OpenCL 1.1): the context associated with the event;
- “command_type”: a string with the command associated with the event;
- “command_execution_status”: the execution status of
the command:
- “queued”: the command has been enqueued in the command queue,
- “submitted”: the command has been submitted to the device,
- “running”: the execution of the command has started,
- “complete”: the execution of the command has finished;
event:get_profiling_info(name)-
Queries profiling information about the command associated with the event.
The function returns a 64-bit integer with a device time in nanoseconds.
namecan be any of the following, in which case the function returns- “queued”: the time when the command was enqueued;
- “submit”: the time when the command was submitted to the device;
- “start”: the time when the execution of the command started;
- “end”: the time when the execution of the command finished.