API

BasicType Reference

class Boolean

References True of False value condition.

class Integer

References a non-decimal number.

class Float

References a decimal number.

class String

References a text type.

class Json

References a json type, each language implements it in a different way.

class Pair(K, V)

References a combination of key-value types stored together as a pair.

Parameters
  • K – Key type.

  • V – Value type.

class List(T)

References a ordered collection.

Parameters

T – Element type.

class Map(K, V)

References a mapping between a key and a value.

Parameters
  • K – Key type.

  • V – Value type.

class Iterable(T)

References a collection capable of returning its members one at a time.

Parameters

T – Element type.

Driver

Ignis

The class Ignis manages the driver environment. Any driver function called before Ignis.start() and after Ignis.stop() will fail.

class Ignis
static start()

Starts the driver environment. The backend module is launched as a sub-process and the other driver functions can now be called. The function will not return until the entire backend configuration process has been completed.

static stop()

Stops the driver environment. The Backend releases all resources and finishes its execution. The function will not return until backend has finished.

IProperties

The class IProperties represents a persistent set of properties. Properties can be read, modified or deleted, initially instances do not contain any properties. If a property that is not stored is read, its default value will be returned if it exists.

class IProperties
set(key, value)

Sets a new property with the specified key.

Parameters
  • key (String) – Property key.

  • value (String) – Property value.

Returns

previous value for key or an empty string.

Return type

String

get(key)

Searches for the property with the specified key. If the key is not found, default value is returned.

Parameters

key (String) – Property key.

Returns

value for key or an empty string if it has no default value..

Return type

String

rm(key)

Removes a property with the specified key and returns its current value.

Parameters

key (String) – Property key.

Returns

value for key or an empty string.

Return type

String

contains(key)

Returns True if property with the specified key has a value or a default value.

Parameters

key (String) – Property key.

Returns

property with key is defined.

Return type

Boolean

toMap(defaults)

Gets all properties and their values.

Parameters

defaults (Boolean) – if true, unstored properties with default values are also returned.

Returns

all properties and their values.

Return type

Map(String, String)

fromMap(map)

Sets all properties defined in the argument.

Parameters

map (Map(String, String)) – A set of properties with their values.

load(path)

Sets all properties defined in the file references by the path. The file must be formatted as .properties format where each line stores a property as key=value or key:value format.

Parameters

path (String) – File path.

Raises

IDriverException – An error is generated if the file does not exist, cannot be read or has an incorrect format.

store(path)

Stores all properties defined in the file references by the path.

Parameters

path (String) – File path.

Raises

IDriverException – An error is generated if the file cannot be created.

clear()

Removes all properties.

ICluster

The class ICluster represents a group of executors containers. Containers are identical instances with the same assigned resources, which are obtained from the properties defined in IProperties.

class ICluster(properties, name)
Parameters
  • properties (IProperties) – Set of properties that will be used to configure the execution environment. Future modifications to the properties will have no effect.

  • name (String) – (Optional) Gives a name to the ICluster, it will be used to identify the ICluster in the job logs and also in the Scheduler, if it supports it.

start()

By default, the cluster will only be started when the first computation is to be performed. This function allows you to force their creation and eliminate the time associated with requesting and granting resources. It must be used to perform performance measurements on the platform.

destroy()

Destroys the current running environment and frees all resources associated with it. Future executions will have to recreate the environment from scratch.

setName(name)

Sets or changes the name associated with the ICluster. The new name will only affect the ICluster log itself and future tasks created. The Scheduler and the existing tasks will keep the name used during their creation.

Parameters

name (String) – New name.

execute(args)

Runs a command on all containers associated with the ICluster. This function does not trigger the creation of the ICluster, it will only be executed if the environment has already been created previously, otherwise the function will be registered to be invoked immediately after its creation.

Parameters

args (List(String)) – Command and its arguments.

executeScript(script)

Like ICluster.execute() but argument is a shell script instead of single command.

Parameters

script (String) – Linux Shell script.

sendFile(source, target)

Sends a file to all containers associated with the ICluster. This function does not trigger the creation of the ICluster, the file only be sent if the environment has already been created previously, otherwise the function will be registered to be invoked immediately after its creation.

Parameters
  • source (String) – Source path in driver container.

  • target (String) – Target path in each executor container.

sendCompressedFile(source, target)

Like ICluster.sendFile() but file is extracted once it has been sent. Supported formats are: .tar, .tar.bz2, .tar.bz, .tar.xz, .tbz2, .tgz, .gz, .bz2, .xz, .zip, .Z. Note that .rar is also supported, but its license requires it to be installed by the user.

ISource

The class ISource is an auxiliary class used by meta-functions in the driver. A meta-function is a function that defines part of its implementation using another function that is passed as a parameter. The way in which the function is defined depends on each implementation.

Typically the following format should be available:

  1. Ignis path: String representation consisting of a file path and a class. The file indicates where the code is stored and the class defines the function to be executed. Format is as follows: path:class

  2. Name: Defines only the name of the function, it is also defined as a string and differs from the previous case because it does not contain : separator.

  3. Source Code: Function is defined using the syntax of the executor’s source code. Executor will recognize it as source code and compile it if necessary.

  4. Lambda: The function is defined in the driver code and then sent as bytes to the executor. In this case driver and executor must be programmed in the same programming language and it must support serialization of executable code.

class ISource(function, native)
Parameters
  • function – Overloaded argument to accept all possible function definitions supported in each implementation.

  • native (Boolean) – (Optional) Type of serialization used to send parameters. If true, the driver language’s own serialization will be used, if and only if the executor also has the same language. Otherwise the multi-language serialization will always be used.

addParam(name, value)

Defines a parameter associated with the function. The value of the parameter can be obtained by the get function during its execution.

Parameters
  • name (String) – Parameter name.

  • value – Value to be stored in the parameter, can have any type.

Returns

This ISource instance.

Return type

ISource

IWorker

The class IWorker represents a group of processes of the same programming language. There is at least one process in each of the ICluster containers where the worker is created, and all containers have the same number of executor processes.

class IWorker(cluster, type, name, cores, instances)
Parameters
  • cluster (ICluster) – ICluster where the executors will be created.

  • type (String) – Name of the worker to be used, the names of the workers are associated to the programming language they execute. The available workers are associated with the image used to create the class ICluster.

  • name (String) – (Optional) Like ICluster a worker can have a name that identifies it in the job log.

  • cores (Integer) – (Optional) Number of cores associated to each executor, by default each executor uses all available cores inside the container.

  • instances (Integer) – (Optional) Number of executors to be launched in each container, by default only one is launched.

start()

By default, the worker will only be started when the first computation is to be performed. This function allows you to force their creation.

destroy()

Destroys all processes associated with the worker. Future executions will have to start the processes again. Destroying the executors means deleting cached elements in memory, only disk cache will be kept.

getCluster()

gets ICluster where worker is created.

setName(name)

Sets or changes the name associated with the IWorker. The new name will only affect the worker log itself and future tasks created. Existing tasks will keep the name used during their creation.

Parameters

name (String) – New name.

parallelize(data, partitions, src, native)

Creates a IDataFrame from an existing collection present in the driver. The elements present in the collection are distributed to the executors for a parallel processing.

Parameters
  • data (Iterable(T)) – A collection object present in the driver.

  • partitions (Integer) – How many partitions the collection elements will be divided. For optimal processing, there should be at least one partition for all cores on each of the executors.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at least IBeforeFunction interface.

  • native (Boolean) – (Optional) Type of serialization used to send data. If true, the driver language’s own serialization will be used, if and only if the executor also has the same language. Otherwise the multi-language serialization will always be used.

Returns

A parallel collection with the same type of data elements.

Return type

IDataFrame(T)

importDataFrame(data, src)

Imports a parallel collection from another worker. The number of partitions will be the same as in the original worker.

Parameters
  • data (IDataFrame(T)) – Parallel collection of source data.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at least IBeforeFunction interface.

Returns

A parallel collection with data elements.

Return type

IDataFrame(T)

textFile(path, minPartitions)

Creates a parallel collection by splitting a text file to create at least minPartitions partitions.

Parameters
  • path (String) – File path.

  • minPartitions (Integer) – Minimal number of partitions.

Returns

A parallel collection of strings.

Return type

IDataFrame(String)

Raises

IDriverException – An error is generated if the file does not exist or cannot be read.

plainFile(path, minPartitions, delim)

Creates a parallel collection by splitting a file using a custom delimiter to create at least minPartitions partitions.

Parameters
  • path (String) – File path.

  • minPartitions (Integer) – Minimal number of partitions. :delim String delim: A one-character string.

Returns

A parallel collection of strings.

Return type

IDataFrame(String)

Raises

IDriverException – An error is generated if the file does not exist or cannot be read.

partitionObjectFile(path, src)

Creates a parallel collection from binary partition files. See IDataFrame.saveAsObjectFile()

Parameters
  • path (String) – File path without the .part* extension.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at east IBeforeFunction interface.

Returns

A parallel collection with type stored in the binary file.

Return type

IDataFrame(T)

Raises

IDriverException – An error is generated if any file do not exist or cannot be read.

partitionTextFile(path)

Creates a parallel collection from text partition files. See IDataFrame.saveAsTextFile()

Parameters

path (String) – File path without the .part* extension.

Returns

A parallel collection of strings.

Return type

IDataFrame(String)

Raises

IDriverException – An error is generated if any file do not exist or cannot be read.

partitionJsonFile(path, src, objectMapping)

Creates a parallel collection from json partition files. See IDataFrame.saveAsJsontFile()

Parameters
  • path (String) – File path without the .part* extension.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at least IBeforeFunction interface.

  • objectMapping (Boolean) – (Optional) If true, json objects are transformed to objects.

Returns

A parallel collection of mapped object, if objectMapping is true or otherwise a generic json type is used.

Return type

IDataFrame(Json) or IDataFrame(T)

Raises

IDriverException – An error is generated if any file do not exist or cannot be read.

loadLibrary(path)

Loads a library of functions in the executor processes. Functions may be invoked using only their name in any ISource. Library type depends on the programming language of executor.

The library can be defined in two ways:

  1. Path to a library file. Library must be compiled if the language requires it.

  2. Source code in plain text, executor will take care of compiling if necessary. This allows you to create functions dynamically from the driver.

Parameters

path (String) – Library path or Source code.

Raises

IDriverException – An error is generated if libreary does not exist or cannot be read.

execute(src)

Runs a function in the executors.

Parameters

src (IIVoidFunction0 or ISource) – Function to be executed.

executeTo(src)

Runs a function in the executors and generates a parallel collection.

Parameters

src (IFunction0 or ISource) – Function to be executed.

Returns

A parallel collection created with the elements returned by the function.

Return type

IDataFrame(T)

call(src, data)

Runs a function that has been previously loaded by IWorker.loadLibrary(). Values returned by the function will generate a parallel collection. Note, this function is designed to execute functions in format name, it does not allow to use the other formats.

Parameters
Returns

A parallel collection created with the elements returned by src function.

Return type

IDataFrame(T)

voidCall(src, data)

Runs a function that has been previously loaded by IWorker.loadLibrary(). Like IWorker.call() but with no return.

Parameters
  • src (IVoidFunction or IVoidFunction0 or ISource) – Function name and its arguments. It must implement IVoidFunction interface if data is supplied or IVoidFunction0 otherwise. Note, this function is designed to execute functions in format name, it does not allow to use the other formats.

  • data (IDataFrame(T)) – (Optional) A parallel collection of elements to be processed by the src function.

IDataFrame

The class IDataFrame represents a parallel collection of elements distributed among the worker executors. All functions defined within this class process the elements in a parallel and distributed way.

class IDataFrame(T)
class T

Represents the type associated with the parallel collection. Dynamic languages do not have to make it visible to the user, it is the input value type for most of the functions defined in IDataFrame.

setName(name)

Sets or changes the name associated with the IDataFrame. The new name will affect only this IDataFrame and future tasks created from it.

Parameters

name (String) – New name.

persist(cacheLevel)

Sets a cache level for elements so that it only needs to be computed once.

Parameters

cacheLevel (ICacheLevel) – level of cache.

cache(cacheLevel)

Sets a cache level ICacheLevel.PRESERVE for elements so that it only needs to be computed once.

unpersist()

Elements cache is disabled. Alias for IDataFrame.uncahe.

uncahe()

Elements cache is disabled. Alias for IDataFrame.unpersist.

partitions()

Gets the number of partitions.

Returns

Number of partitions.

Return type

Integer.

saveAsObjectFile(path, compression)

Saves elements as binary files.

Parameters
  • path (String) – path to store the data.

  • compression (Integer) – compresion level (0-9).

Raises

IDriverException – An error is generated if files exists or cannot be write.

saveAsTextFile(path)

Saves elements as text files.

Parameters

path (String) – path to store the data.

Raises

IDriverException – An error is generated if files exists or cannot be write.

saveAsJsonFile(path, pretty)

Saves elements as json files.

Parameters
  • path (String) – path to store the data.

  • pretty (Boolean) – uses an ident format instead of compact.

Raises

IDriverException – An error is generated if files exists or cannot be write.

repartition(numPartitions, preserveOrdering, global)

Creates a new Dataframe with a fixes number of partitions.

Parameters
  • numPartitions (Integer) – number of partitions.

  • preserveOrdering (Boolean) – The order of the elements does not change.

  • global (Boolean) – Elements are balanced between different executors. If false, Elements are only balanced within each executor.

Returns

A Dataframe with numPartitions.

Return type

IDataFrame(T)

partitionByRandom(numPartitions, seed)

Creates a new Dataframe with a fixes number of partitions. Elements are randomly distributed among the executors.

Parameters

numPartitions (Integer) – number of partitions. :param Integer seed: Initializes the random number generator.

Returns

A Dataframe with numPartitions.

Return type

IDataFrame(T)

partitionByHash(numPartitions)

Creates a new Dataframe with a fixes number of partitions. Elements are distributed using a hash function among the executors.

Parameters

numPartitions (Integer) – number of partitions.

Returns

A Dataframe with numPartitions.

Return type

IDataFrame(T)

partitionBy(src, numPartitions)

Creates a new Dataframe with a fixes number of partitions. Elements are distributed using a custom function among the executors. The same function return assigns the same partition.

Parameters
Returns

A Dataframe with numPartitions.

Return type

IDataFrame(T)

map(src)

Performs a map operation.

Parameters

src (IFunction(T, R) or ISource.) – Function argument.

Returns

A Dataframe with result elements.

Return type

IDataFrame(R)

mapWithIndex(src)

Performs a map operation. Like IDataFrame.map but global index of the element is available as the first argument of the function.

Parameters

src (IFunction2(Integer, T, R) or ISource.) – Function argument.

Returns

A Dataframe with result elements.

Return type

IDataFrame(R)

filter(src)

Performs a filter operation. Only items that return True will be retained.

Parameters

src (IFunction(T, Boolean) or ISource.) – Function argument.

Returns

A Dataframe with result elements.

Return type

IDataFrame(T)

flatmap(src)

Performs a flatmap operation. Like IDataFrame.map but each element can generate any number of results.

Parameters

src (IFunction(T, Iterable(R)) or ISource.) – Function argument.

Returns

A Dataframe with result elements.

Return type

IDataFrame(R)

keyBy(src)

Assigns each element a key with the return of the function.

Parameters

src (IFunction(T, R) or ISource.) – Function argument.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(R, T)

mapPartitions(src, preservesPartitioning)

Performs a specialized map that is called only once for each partition, elements can be accessed using an iterator.

Parameters
Returns

A Dataframe with result elements.

Return type

IDataFrame(R)

mapPartitionsWithIndex(src, preservesPartitioning)

Performs a specialized map that is called only once for each partition, elements can be accessed using an iterator. Like IDataFrame.mapPartitions but global index of the partition is available as the first argument of the function.

Parameters
Returns

A Dataframe with result elements.

Return type

IDataFrame(R)

mapExecutor(src)

Performs a specialized map that is called only once for each executor, elements can be accessed using a list of lists where first list represents each partition. Function argument can be modified to add or remove values, if you want to generate other value type use :class: IDataFrame.mapExecutorTo.

Parameters

src (IVoidFunction(List(List(T))) or ISource.) – Function argument.

Returns

A Dataframe with result elements.

Return type

IDataFrame(R)

mapExecutorTo(src)

Performs a specialized map that is called only once for each executor, elements can be accessed using a list of lists where first list represents each partition. A new list of lists must be returned to generate new partitions.

Parameters

src (IFunction(List(List(T)), List(List(R))) or ISource.) – Function argument.

Returns

A Dataframe with result elements.

Return type

IDataFrame(R)

groupBy(src, numPartitions)

Groups elements that share the same key, which is obtained from the return of the function.

Parameters
  • src (IFunction(T, R)) or ISource.) – Function argument.

  • numPartitions (Integer) – (Optional) Number of resulting partitions.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(R, List(T))

sort(ascending, numPartitions)

Sort the elements using their natural order.

Parameters
  • ascending (Boolean) – Allows you to choose between ascending and descending order.

  • numPartitions (Integer) – (Optional) Number of resulting partitions.

Returns

A Dataframe with result elements.

Return type

IDataFrame(T)

sortBy(src, ascending, numPartitions)

Sort the elements using a custom function, that checks if the first argument is less than the second.

Parameters
  • src (IFunction2(T, T, Boolean)) or ISource.) – Function argument.

  • ascending (Boolean) – Allows you to choose between ascending and descending order.

  • numPartitions (Integer) – (Optional) Number of resulting partitions.

Returns

A Dataframe with result elements.

Return type

IDataFrame(T)

union(other, preserveOrder, src)

Merges elements of two dataframes.

Parameters
  • other (IDataFrame(T)) – other dataframe.

  • preserveOrder (Boolean) – If true, the second dataframe is concatenated to the first, otherwise they are mixed.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at east IBeforeFunction interface.

Returns

A Dataframe with result elements of the two dataframes.

Return type

IDataFrame(T)

distinct(numPartitions, src)

Duplicate elements are eliminated.

Parameters
  • numPartitions (Integer) – Number of resulting partitions.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at east IBeforeFunction interface.

Returns

A Dataframe with result elements.

Return type

IDataFrame(T)

reduce(src)

Accumulate the elements using a custom function, which must be associative and commutative. Like IDataFrame.treeReduce but final accumulation is performed in a single executor.

Parameters

src (IFunction2(T, T, T)) or ISource.) – Function argument.

Returns

Element resulting from accumulation.

Return type

T

treeReduce(src)

Accumulate the elements using a custom function, which must be associative and commutative. Like IDataFrame.reduce but final accumulation is performed in parallel using multiple executors.

Parameters

src (IFunction2(T, T, T)) or ISource.) – Function argument.

Returns

Element resulting from accumulation.

Return type

T

collect()

Retrieve all the elements.

Returns

All the elements.

Return type

List(T)

aggregate(zero, seqOp, combOp)

Accumulate the elements using two functions, which must be associative and commutative. Like :class: IDataFrame.treeAggregate` but final accumulation is performed in a single executor.

Parameters
  • zero (IFunction0(R)) or ISource.) – Function argument to generate initial value of target type.

  • seqOp (IFunction2(T, R, R)) or ISource.) – Function argument to accumulate the elements of each partition.

  • combOp (IFunction2(R, R, R)) or ISource.) – Function argument to accumulate the results of all partitions .

Returns

Element resulting from accumulation.

Return type

R

treeAggregate(zero, seqOp, combOp)

Accumulate the elements using two functions, which must be associative and commutative. Like IDataFrame.aggregate but final accumulation is performed in parallel using multiple executors.

Parameters
  • zero (IFunction0(R)) or ISource.) – Function argument to generate initial value of target type.

  • seqOp (IFunction2(T, R, R)) or ISource.) – Function argument to accumulate the elements of each partition.

  • combOp (IFunction2(R, R, R)) or ISource.) – Function argument to accumulate the results of all partitions .

Returns

Element resulting from accumulation.

Return type

R

fold(zero, src)

Accumulate the elements using a initial value and custom function, which must be associative and commutative. Like IDataFrame.treeFold but final accumulation is performed in a single executor.

Parameters
  • zero (IFunction0(R)) or ISource.) – Function argument to generate initial value of target type.

  • src (IFunction2(T, T, T)) or ISource.) – Function argument to accumulate.

Returns

Element resulting from accumulation.

Return type

T

treeFold(zero, src)

Accumulate the elements using a initial value and custom function, which must be associative and commutative. Like IDataFrame.treeFold but final accumulation is performed in parallel using multiple executors.

Parameters
  • zero (IFunction0(R)) or ISource.) – Function argument to generate initial value of target type.

  • src (IFunction2(T, T, T)) or ISource.) – Function argument to accumulate.

Returns

Element resulting from accumulation.

Return type

T

take(num)

Retrieves the first num elements.

Parameters

num (Integer) – Number of elements.

Returns

First num elements.

Return type

List(T)

foreach(src)

Calls a custom function once for each element.

Parameters

src (IVoidFunction(T) or ISource.) – Function argument.

foreachPartition(src)

Calls a custom function once for each partition, elements can be accessed using an iterator.

Parameters

src (IVoidFunction(IReadIterator(T)) or ISource.) – Function argument.

foreachExecutor(src)

Calls a custom function once for each executor, elements can be accessed using a list of lists where first list represents each partition.

Parameters

src (IVoidFunction(List(List(T))) or ISource.) – Function argument.

top(num, cmp)

Retrieves the first num elements in descending order. A custom function can be used to checks if the first argument is less than the second

Parameters
  • num (Integer) – Number of elements.

  • cmp (IFunction2(T, T, Boolean)) or ISource.) – (Optional) Comparator to be used instead of the natural order.

Returns

First num elements.

Return type

List(T)

takeOrdered(num, cmp)

Retrieves the first num elements in ascending order. A custom function can be used to checks if the first argument is less than the second

Parameters
  • num (Integer) – Number of elements.

  • cmp (IFunction2(T, T, Boolean)) or ISource.) – (Optional) Comparator to be used instead of the natural order.

Returns

First num elements.

Return type

List(T)

sample(withReplacement, fraction, seed)

Generates a random sample records from the original elements.

Parameters
  • withReplacement (Boolean) – An element can be selected more than once.

  • fraction (Float) – Percentage of the sample.

  • seed (Integer) – Initializes the random number generator.

Returns

A Dataframe with result elements.

Return type

IDataFrame(T)

takeSample(withReplacement, num, seed)

Generates and Retrieves a random sample of num records from the original elements.

Parameters
  • withReplacement (Boolean) – An element can be selected more than once.

  • num (Integer) – Number of elements.

  • seed (Integer) – Initializes the random number generator.

Returns

A Dataframe with result elements.

Return type

IDataFrame(T)

count()

Count the elements.

Returns

Number of elements.

Return type

Integer

max(cmp)

Retrieves the element with the maximum value. A custom function can be used to checks if the first argument is less than the second. Like Dataframe.top with num=1

Parameters
  • num (Integer) – Number of elements.

  • cmp (IFunction2(T, T, Boolean)) or ISource.) – (Optional) Comparator to be used instead of the natural order.

Returns

Element with the maximum value.

Return type

T

min(cmp)

Retrieves the element with the minimal value. A custom function can be used to checks if the first argument is less than the second. Like Dataframe.takeOrdered with num=1

Parameters
  • num (Integer) – Number of elements.

  • cmp (IFunction2(T, T, Boolean)) or ISource .) – (Optional) Comparator to be used instead of the natural order.

Returns

Element with the minimal value.

Return type

T

toPair()

Converts IDataFrame to IPairDataFrame when IDataFrame.T is a Pair of IPairDataFrame.K and IPairDataFrame.V.

Returns

A Dataframe of pairs

Return type

IPairDataFrame(K, V)

class IPairDataFrame(K, V)

Extends IDataFrame funtionality when IDataFrame.T is a Pair

class K

Represents the value type associated with the parallel collection. Dynamic languages do not have to make it visible to the user, it is the key input value type for most of the functions defined in IPairDataFrame.

class V

Represents the value type associated with the parallel collection. Dynamic languages do not have to make it visible to the user, it is the value input value type for most of the functions defined in IPairDataFrame.

join(other, preserveOrder, numPartitions, src)

Joins an element of this collection with an element of other that share the same key.

Parameters
  • other (IPairDataFrame(K, V)) – other dataframe.

  • numPartitions (Integer) – Number of resulting partitions.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at east IBeforeFunction interface.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, Pair(V, V))

flatMapValues(src)

Performs a map function only on the values while preserving the key. Like IPairDataFrame.mapValues but each element can generate any number of results, key is duplicated or deleted if necessary.

Parameters

src (IFunction(V, R) or ISource.) – Function argument.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, R)

mapValues(src)

Performs a map function only on the values while preserving the key.

Parameters

src (IFunction(V, R) or ISource.) – Function argument.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, R)

groupByKey(numPartitions, src)

Groups elements that share the same key.

Parameters
  • numPartitions (Integer) – Number of resulting partitions.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at east IBeforeFunction interface.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, List(V))

reduceByKey(src, numPartitions, localReduce)

Accumulate the values that share the same key using a custom function, which must be associative and commutative.

Parameters
  • src (IFunction2(V, V, V)) or ISource.) – Function argument.

  • numPartitions (Integer) – Number of resulting partitions.

  • localReduce (Boolean) – Accumulate the values that share the same key in a executor before global accumulation. Reduces the size of the exchange if there are duplicated keys in multiple partitions.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, V)

aggregateByKey(zero, seqOp, combOp, numPartitions)

Accumulate the values that share the same key using two functions, which must be associative and commutative.

Parameters
  • zero (IFunction0(R)) or ISource.) – Function argument to generate initial value of target type.

  • seqOp (IFunction2(V, R, R)) or ISource.) – Function argument to accumulate the values that share the same key of each partition.

  • combOp (IFunction2(R, R, R)) or ISource.) – Function argument to accumulate the results that share the same key of all partitions .

  • numPartitions (Integer) – Number of resulting partitions.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, V)

foldByKey(zero, src, numPartitions, localFold)

Accumulate the values that share the same key using a initial value and custom function, which must be associative and commutative.

Parameters
  • zero (IFunction0(R)) or ISource.) – Function argument to generate initial value of target type.

  • src (IFunction2(V, V, V)) or ISource.) – Function argument to accumulate.

  • numPartitions (Integer) – Number of resulting partitions.

  • localFold (Boolean) – Accumulate the values that share the same key in a executor before global accumulation. Reduces the size of the exchange if there are duplicated keys in multiple partitions.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, V)

sortByKey(ascending, numPartitions, src)

Sort the keys using their natural order.

Parameters
  • ascending (Boolean) – Allows you to choose between ascending and descending order.

  • numPartitions (Integer) – Number of resulting partitions.

  • src (ISource) – (Optional) Auxiliary function to configure executor, its use may vary between languages. Must implement at east IBeforeFunction interface.

Returns

A Dataframe of pairs with result elements.

Return type

IPairDataFrame(K, V)

keys()

Retrieve unique keys.

Returns

The unique keys.

Return type

List(K)

values()

Retrieve unique values.

Returns

The unique values.

Return type

List(V)

sampleByKey(withReplacement, fractions, seed, native)

Generates a random sample records from the values that share the same key.

Parameters
  • withReplacement (Boolean) – An element can be selected more than once.

  • fraction (Map(K, Float)) – Percentage of the sample by key. Absences are taken as zero.

  • seed (Integer) – Initializes the random number generator.

  • native (Boolean) – (Optional) sends fractions with native serialization.

Returns

A Dataframe with result elements.

Return type

IDataFrame(T)

countByKey()

Count unique keys.

Returns

Number unique of values.

Return type

Integer

countByValue()

Count unique keys.

Returns

Number unique of values.

Return type

Integer

class ICacheLevel
NO_CACHE: Integer = 0

Elements cache is disabled.

PRESERVE: Integer = 1

Elements will be cached in the same storage in which it is stored.

MEMORY: Integer = 2

Elements will be cached on memory storage.

RAW_MEMORY: Integer = 3

Elements will be cached on raw memory storage.

DISK: Integer = 4

Elements will be cached on disk storage.

IDriverException

The class IDriverException represents an execution error. Exceptions are defined together with the function that generates them, but they are actually thrown by the function that causes the execution.

class IDriverException

Executor

class IContext

The executor context allows the API functions to interact with the rest of the IgnisHPC system.

cores()
Returns

Number of cores assigned to the executor.

Return type

Integer

executors()
Returns

Number of executors.

Return type

Integer

executorId()
Returns

Unique identifier of the executor, a number greater than or equal to zero and less than the number of executors.

Return type

Integer

threadId()
Returns

Unique identifier of the current thread, a number greater than or equal to zero and less than the than the number of cores.

Return type

Integer

mpiGroup()
Returns

Returns the mpi group of the executors.

props()
Returns

Driver IProperties as Map object.

Return type

Map(String, String)

vars()

(This function may vary depending on the implementation.)

Returns

Variables sent by ISource.addParam as Map object.

Return type

Map(String, Any)

class IReadIterator

Transverse through elements of a partition.

hasNext()
Returns

True if elements remain

Return type

Boolean

next()
Returns

Next element.

class IBeforeFunction
before(context)
Parameters

context (IContext) – The executor context.

class IVoidFunction0
before(context)
Parameters

context (IContext) – The executor context.

call(context)
Parameters

context (IContext) – The executor context.

after(context)
Parameters

context (IContext) – The executor context.

class IVoidFunction
before(context)
Parameters

context (IContext) – The executor context.

call(context, v)
Parameters
  • context (IContext) – The executor context.

  • v – Argument

after(context)
Parameters

context (IContext) – The executor context.

class IVoidFunction2
before(context)
Parameters

context (IContext) – The executor context.

call(context, v1, v2)
Parameters
  • context (IContext) – The executor context.

  • v1 – Argument 1

  • v2 – Argument 2

after(context)
Parameters

context (IContext) – The executor context.

class IFunction0
before(context)
Parameters

context (IContext) – The executor context.

call(context)
Parameters

context (IContext) – The executor context.

Returns

This function must return a value.

after(context)
Parameters

context (IContext) – The executor context.

class IFunction
before(context)
Parameters

context (IContext) – The executor context.

call(context, v)
Parameters
  • context (IContext) – The executor context.

  • v – Argument

Returns

This function must return a value.

after(context)
Parameters

context (IContext) – The executor context.

class IFunction2
before(context)
Parameters

context (IContext) – The executor context.

call(context, v1, v2)
Parameters
  • context (IContext) – The executor context.

  • v1 – Argument 1

  • v2 – Argument 2

Returns

This function must return a value.

after(context)
Parameters

context (IContext) – The executor context.