Task runner that executes a task inside a job in Azure Batch.

This plugin is only available in the Enterprise Edition (EE).

This task runner is container-based so the containerImage property must be set.

To access the task's working directory, use the {{workingDir}} Pebble expression or the WORKING_DIR environment variable. Input files and namespace files will be available in this directory.

To generate output files you can either use the outputFiles task's property and create a file with the same name in the task's working directory, or create any file in the output directory which can be accessed by the {{outputDir}} Pebble expression or the OUTPUT_DIR environment variables.

To use inputFiles, outputFiles or namespaceFiles properties, make sure to set the blobStorage property. The blob storage serves as an intermediary storage layer for the task runner. Input and namespace files will be uploaded to the cloud storage bucket before the task run. Similarly, the task runner will store outputFiles in this blob storage during the task run. In the end, the task runner will make those files available for download and preview from the UI by sending them to internal storage.

The task runner will generate a folder in the configured blobStorage for each task run. You can access that folder using the {{bucketPath}} Pebble expression or the BUCKET_PATH environment variable. There is two supported way to provide authentication for the blob storage:

  • connectionString and containerName properties
  • containerName, endpoint, sharedKeyAccountName and sharedKeyAccountAccessKey properties

Note that when the Kestra Worker running this task is terminated, the batch job will still runs until completion, then after restarting, the Worker will resume processing on the existing job unless resume is set to false.

yaml
type: "io.kestra.plugin.ee.azure.runner.Batch"

Execute a Shell command.

yaml
id: new_shell
namespace: company.team

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    taskRunner:
      type: io.kestra.plugin.ee.azure.runner.Batch
      account: "{{secrets.account}}"
      accessKey: "{{secrets.accessKey}}"
      endpoint: "{{secrets.endpoint}}"
      poolId: "{{vars.poolId}}"
    commands:
      - echo "Hello World"

Pass input files to the task, execute a Shell command, then retrieve output files.

yaml
id: new_shell_with_file
namespace: company.team

inputs:
  - id: file
    type: FILE

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    inputFiles:
      data.txt: "{{inputs.file}}"
    outputFiles:
      - out.txt
    containerImage: centos
    taskRunner:
      type: io.kestra.plugin.azure.ee.runner.Batch
      account: "{{secrets.account}}"
      accessKey: "{{secrets.accessKey}}"
      endpoint: "{{secrets.endpoint}}"
      poolId: "{{vars.poolId}}"
      blobStorage:
        connectionString: "{{secrets.connectionString}}"
        containerName: "{{vars.containerName}}"
    commands:
      - cp {{workingDir}}/data.txt {{workingDir}}/out.txt

Run a Python script to fetch environment information on Azure with Azure Batch VMs

yaml
id: azure_batch_runner
namespace: company.team

variables:
  pool_id: poolId
  container_name: containerName

tasks:
  - id: scrape_environment_info
    type: io.kestra.plugin.scripts.python.Commands
    containerImage: ghcr.io/kestra-io/pydata:latest
    taskRunner:
      type: io.kestra.plugin.ee.azure.runner.Batch
      account: "{{ secret('AZURE_ACCOUNT') }}"
      accessKey: "{{ secret('AZURE_ACCESS_KEY') }}"
      endpoint: "{{ secret('AZURE_ENDPOINT') }}"
      poolId: "{{ vars.pool_id }}"
      blobStorage:
        containerName: "{{ vars.container_name }}"
        connectionString: "{{ secret('AZURE_CONNECTION_STRING') }}"
    commands:
      - python {{ workingDir }}/main.py
    namespaceFiles:
      enabled: true
    outputFiles:
      - environment_info.json
    inputFiles:
      main.py: |
        import platform
        import socket
        import sys
        import json

        from kestra import Kestra

        print("Hello from Azure Batch and kestra!")

        def print_environment_info():
            print(f"Host's network name: {platform.node()}")
            print(f"Python version: {platform.python_version()}")
            print(f"Platform information (instance type): {platform.platform()}")
            print(f"OS/Arch: {sys.platform}/{platform.machine()}")

            env_info = {
                "host": platform.node(),
                "platform": platform.platform(),
                "OS": sys.platform,
                "python_version": platform.python_version(),
            }
            Kestra.outputs(env_info)

            filename = 'environment_info.json'
            with open(filename, 'w') as json_file:
                json.dump(env_info, json_file, indent=4)

        if __name__ == '__main__':
          print_environment_info()
Properties

The Batch access key.

The Batch account name.

The blob service endpoint.

Id of the pool on which to run the job.

Default PT5S
Format duration

Determines how often Kestra should poll the container for completion. By default, the task runner checks every 5 seconds whether the job is completed. You can set this to a lower value (e.g. PT0.1S = every 100 milliseconds) for quick jobs and to a lower threshold (e.g. PT1M = every minute) for long-running jobs. Setting this property to a lower value will reduce the number of API calls Kestra makes to the remote service — keep that in mind in case you see API rate limit errors.

Default true

Whether the job should be deleted upon completion.

Warning, if the job is not deleted, a retry of the task could resume an old failed attempt of the job.

The private registry which contains the container image.

Default true

Whether to reconnect to the current job if it already exists.

Default false

Enable log streaming during task execution.

This property is useful for capturing logs from tasks that have a timeout. If a task with a timeout is terminated, this property makes sure all logs up to that point are retrieved.

Validation RegExp \d+\.\d+\.\d+(-[a-zA-Z0-9-]+)?|([a-zA-Z0-9]+)

The version of the plugin to use.

Default PT1H
Format duration

The maximum duration to wait for the job completion unless the task timeout property is set, which will take precedence over this property.

Azure Batch will automatically timeout the job upon reaching such duration and the task will fail.

The URL of the blob container the compute node should use.

Mandatory if you want to use namespaceFiles, inputFiles or outputFiles properties.

Connection string of the Storage Account.

The blob service endpoint.

Shared Key access key for authenticating requests.

Shared Key account name for authenticating requests.

The reference to the user assigned identity to use to access the Azure Container Registry instead of username and password.

The password to log into the registry server.

The registry server URL.

If omitted, the default is "docker.io".

The user name to log into the registry server.

The ARM resource ID of the user assigned identity.