Create a smart task

A smart task is a type of task that uses AI code generators to solve a specific problem. These AI code generators are good at understanding what you want to do and then they create code to do it. The code is then run automatically to execute the wanted task.

For example, imagine you have a table. You can tell the smart table transformer task to do things with this table using regular language. You can say things like "remove some columns" or "do some calculations."

Making smart tasks is not hard. Let's learn how to make one.

Introduction

In your brick, create a task that extends SmartTaskBase (from gws_core). The SmartTaskBase class manage the call to OpenAI, and the execution of the code. You just need to tell it what you want to do, give the input and extract the outputs.

When you extends the SmartTaskBase you have to override 3 methods :

get_context : Method to build the context of the openAI chat. This context is used to give precise information to the AI to generate the code you want

build_openai_code_inputs : Method to build the variables that will be accessible in the generated code.

build_task_outputs : Method to build the task outputs resources based on the generated code results.

Just like any other task, you can set up inputs, outputs, and other settings.

Let's see an example of how we did this for the TableSmartTransformer task.

Inputs, outputs and config

The SmartTaskBase already has a config that allows to interact with openAI. If you don't need extra configs, you can skip the definition of the config_specs. If you do need more, you can add them to the existing ones like this :

config_specs: ConfigSpecs = {
        # get the openAI config specs
        **SmartTaskBase.config_specs,
        # add custom config specs
        "keep_columns_tags": BoolParam(default_value=False, human_name="Keep columns tags",
                                       short_description="If true, the columns tags are kept in the output table for columns that have the same names."),
        "keep_rows_tags": BoolParam(default_value=False, human_name="Keep rows tags",
                                    short_description="If true, the rows tags are kept in the output table for rows that have the same names."),
    }

Get context

In the get_context method we defined a clear context so that the AI know exactly what code to generate, so the execution has more chance to work. This part is really important and you will have to try multiple times to find the correct words so the AI generate the code you want.

Here is the context method for TableSmartTransformer

def get_context(self, params: ConfigParams, inputs: TaskInputs) -> str:
        # get the table
        source: Table = inputs["source"]
        context = "You are a developer assistant that generate code in python to transform a dataframe."
        context += f"\n{OpenAiHelper.get_code_context([GwsCorePackages.PANDAS, GwsCorePackages.NUMPY])}"
        context += f"\nThe dataframe has {source.nb_rows} rows and {source.nb_columns} columns."
        context += "The transformed dataframe must be assigned to a variable named 'target'."
        return context

Here is few tip for the context :

Clearly define what is the purpose of the generation. Here we say to 'generate code in python to transform a dataframe'

Be sure that the object you manipulate are know by the AI (ChatGPT). Here we use Dataframe which is a well known object.

Clearly define the output variables names and types.

But sure that the generated code is executable directly and is not a method for example. It is recommended to use the OpenAiHelper.get_code_context method to define the rules of code generation. With this method you can also provide information about available libraries as well as version for this library.

Provide informations about the inputs if needed (like the size of the dataframe). Be careful to avoid sending sensitive information to the API.

Build inputs

The build_openai_code_inputs method must return a dictionary of variables that will be accessible in the generated code. You can build variables from task input or create new one. You can also transform the task input resources to another object that is know by OpenAI. For exemple here we pass the Dataframe to OpenAI and not the Table because OpenAI knows the Dataframe object.

def build_openai_code_inputs(self, params: ConfigParams, inputs: TaskInputs) -> dict:
        # get the table
        source: Table = inputs["source"]
        # pass the dataframe as input
        return {"source": source.get_data()}

Build outputs

In the build_task_outputs method you will get a dictionary of available variables generated by the code. You have to extract the outputs from this dictionary and convert them to Resources. Be sure that the output keys match the information you provided in the context, here target.

It is also here that you can build a code that is compatible with live tasks base on the generated code. With this, the user will be able to copy a code that he can directly used in Live task.

def build_task_outputs(self, params: ConfigParams, inputs: TaskInputs,
                           code_outputs: dict, generated_code: str) -> dict:
        output = code_outputs.get("target", None)
        if output is None:
            raise Exception("The code did not generate any output")
        if not isinstance(output, DataFrame):
            raise Exception("The output must be a pandas DataFrame")
        # make the output code compatible with the live task
        live_task_code = f"""
from gws_core import Table
# keep the original table
source_table = sources[0]
# retrieve the dataframe for the generated code
source = sources[0].get_data()
{generated_code}
# convert the dataframe to a table
table_target = Table(target)
"""
        result = Table(output)
        # get the table
        source: Table = inputs["source"]
        # manager the tags options
        if params.get_value("keep_columns_tags"):
            # copy the tags from the source table to the target table
            result.copy_column_tags_by_name(source)
            # update the live task code to copy the tags
            live_task_code += "\ntable_target.copy_column_tags_by_name(source_table)"
        if params.get_value("keep_rows_tags"):
            result.copy_row_tags_by_name(source)
            live_task_code += "\ntable_target.copy_row_tags_by_name(source_table)"
        # set an the output as array
        live_task_code += "\ntargets = [table_target]"
        generated_text = Text(live_task_code)
        generated_text.name = "Table transformation code"
        return {'target': result, 'generated_code': generated_text}

Complete code

@task_decorator("SmartTableTransformer", human_name="Smart table transformer",
                short_description="Table transformer that uses AI  (OpenAI).")
class TableSmartTransformer(SmartTaskBase):
    """
This task is still in beta version.
This task uses openAI API to generate python code that transforms a dataframe. This code is then automatically executed.
/!\ This task does not support table tags.
The data of the table is not transferered to OpenAI, only the provided text.
    """
    input_specs: InputSpecs = InputSpecs({
        'source': InputSpec(Table),
    })
    output_specs: OutputSpecs = OutputSpecs({
        'target': OutputSpec(Table),
        'generated_code': SmartTaskBase.generated_code_output
    })
    config_specs: ConfigSpecs = {
        # get the openAI config specs
        **SmartTaskBase.config_specs,
        # add custom config specs
        "keep_columns_tags": BoolParam(default_value=False, human_name="Keep columns tags",
                                       short_description="If true, the columns tags are kept in the output table for columns that have the same names."),
        "keep_rows_tags": BoolParam(default_value=False, human_name="Keep rows tags",
                                    short_description="If true, the rows tags are kept in the output table for rows that have the same names."),
    }
    def get_context(self, params: ConfigParams, inputs: TaskInputs) -> str:
        # get the table
        source: Table = inputs["source"]
        context = "You are a developer assistant that generate code in python to transform a dataframe."
        context += f"\n{OpenAiHelper.get_code_context([GwsCorePackages.PANDAS, GwsCorePackages.NUMPY])}"
        context += f"\nThe dataframe has {source.nb_rows} rows and {source.nb_columns} columns."
        context += "The transformed dataframe must be assigned to a variable named 'target'."
        return context
    def build_openai_code_inputs(self, params: ConfigParams, inputs: TaskInputs) -> dict:
        # get the table
        source: Table = inputs["source"]
        # pass the dataframe as input
        return {"source": source.get_data()}
    def build_task_outputs(self, params: ConfigParams, inputs: TaskInputs,
                           code_outputs: dict, generated_code: str) -> dict:
        output = code_outputs.get("target", None)
        if output is None:
            raise Exception("The code did not generate any output")
        if not isinstance(output, DataFrame):
            raise Exception("The output must be a pandas DataFrame")
        # make the output code compatible with the live task
        live_task_code = f"""
from gws_core import Table
# keep the original table
source_table = sources[0]
# retrieve the dataframe for the generated code
source = sources[0].get_data()
{generated_code}
# convert the dataframe to a table
table_target = Table(target)
"""
        result = Table(output)
        # get the table
        source: Table = inputs["source"]
        # manager the tags options
        if params.get_value("keep_columns_tags"):
            # copy the tags from the source table to the target table
            result.copy_column_tags_by_name(source)
            # update the live task code to copy the tags
            live_task_code += "\ntable_target.copy_column_tags_by_name(source_table)"
        if params.get_value("keep_rows_tags"):
            result.copy_row_tags_by_name(source)
            live_task_code += "\ntable_target.copy_row_tags_by_name(source_table)"
        # set an the output as array
        live_task_code += "\ntargets = [table_target]"
        generated_text = Text(live_task_code)
        generated_text.name = "Table transformation code"
        return {'target': result, 'generated_code': generated_text}