Pull and ingest data from a third-party API

This tutorial requires multiple libraries. This can make your deployment package size larger than the 250 MB limit of Lambda. You can use a Docker container to extend the package size up to 10 GB, giving you much more flexibility in libraries and dependencies. For more about AWS Lambda container support, see the AWS documentation.

The libraries used in this tutorial:

Extract, transform, and load (ETL) functions are used to pull data from one database and ingest the data into another. In this tutorial, the ETL function pulls data from a finance API called Alpha Vantage, and inserts the data into TimescaleDB. The connection is made using the values from environment variables.

This is the ETL function used in this tutorial:

Add a requirements file

When you have created the ETL function, you need to include the libraries you want to install. You can do this by creating a text file in your project called requirements.txt that lists the libraries. This is the requirements.txt file used in this tutorial:

  1. pandas
  2. requests
  3. pgcopy
note

This example uses psycopg2-binary instead of psycopg2 in the requirements.txt file. The binary version of the library contains all its dependencies, so that you don’t need to install them separately.

When you have the requirements set up, you can create the Dockerfile for the project.

    1. FROM public.ecr.aws/lambda/python:3.8
  1. Copy all project files to the root directory:

    1. COPY function.py .
  2. Install the libraries using the requirements file:

    1. RUN pip install -r requirements.txt
    2. CMD ["function.handler"]

Upload the image to ECR

To connect the container image to a Lambda function, you need to upload it to the AWS Elastic Container Registry (ECR).

  1. Log in to the Docker command line interface:

  2. Build the image:

    1. docker build -t lambda-image .
  3. Create a repository in ECR. In this example, the repository is called lambda-image:

    1. aws ecr create-repository --repository-name lambda-image
  4. Tag your image using the same name as the repository:

    1. docker tag lambda-image:latest <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/lambda-image:latest

    To create a Lambda function from your container, you can use the Lambda create-function command. You need to define the --package-type parameter as image, and add the ECR Image URI using the --code flag:

    Schedule the Lambda function

    If you want to run your Lambda function according to a schedule, you can set up an EventBridge trigger. This creates a rule using a .

    1. Create the schedule. In this example, the function runs every day at 9 AM:

      1. aws events put-rule --name schedule-lambda --schedule-expression 'cron(0 9 * * ? *)'
    2. Grant the necessary permissions for the Lambda function:

      1. aws lambda add-permission --function-name <FUNCTION_NAME> \
      2. --statement-id my-scheduled-event --action 'lambda:InvokeFunction' \
      3. --principal events.amazonaws.com
    3. Add the function to the EventBridge rule, by creating a targets.json file containing a memorable, unique string, and the ARN of the Lambda Function:

      1. [
      2. {
      3. "Id": "docker_lambda_trigger",
      4. "Arn": "<ARN_LAMBDA_FUNCTION>"
      5. }
      6. ]
    important

    If you get an error saying Parameter ScheduleExpression is not valid, you might have made a mistake in the cron expression. Check the cron expression examples documentation.