Pull and ingest data from a third-party API
This tutorial requires multiple libraries. This can make your deployment package size larger than the 250 MB limit of Lambda. You can use a Docker container to extend the package size up to 10 GB, giving you much more flexibility in libraries and dependencies. For more about AWS Lambda container support, see the AWS documentation.
The libraries used in this tutorial:
Extract, transform, and load (ETL) functions are used to pull data from one database and ingest the data into another. In this tutorial, the ETL function pulls data from a finance API called Alpha Vantage, and inserts the data into TimescaleDB. The connection is made using the values from environment variables.
This is the ETL function used in this tutorial:
Add a requirements file
When you have created the ETL function, you need to include the libraries you want to install. You can do this by creating a text file in your project called requirements.txt
that lists the libraries. This is the requirements.txt
file used in this tutorial:
pandas
requests
pgcopy
note
This example uses psycopg2-binary
instead of psycopg2
in the requirements.txt
file. The binary version of the library contains all its dependencies, so that you don’t need to install them separately.
When you have the requirements set up, you can create the Dockerfile for the project.
-
FROM public.ecr.aws/lambda/python:3.8
Copy all project files to the root directory:
COPY function.py .
Install the libraries using the requirements file:
RUN pip install -r requirements.txt
CMD ["function.handler"]
Upload the image to ECR
To connect the container image to a Lambda function, you need to upload it to the AWS Elastic Container Registry (ECR).
Log in to the Docker command line interface:
Build the image:
docker build -t lambda-image .
Create a repository in ECR. In this example, the repository is called
lambda-image
:aws ecr create-repository --repository-name lambda-image
Tag your image using the same name as the repository:
docker tag lambda-image:latest <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/lambda-image:latest
-
To create a Lambda function from your container, you can use the Lambda create-function
command. You need to define the --package-type
parameter as image
, and add the ECR Image URI using the --code
flag:
Schedule the Lambda function
If you want to run your Lambda function according to a schedule, you can set up an EventBridge trigger. This creates a rule using a .
Create the schedule. In this example, the function runs every day at 9 AM:
aws events put-rule --name schedule-lambda --schedule-expression 'cron(0 9 * * ? *)'
Grant the necessary permissions for the Lambda function:
aws lambda add-permission --function-name <FUNCTION_NAME> \
--statement-id my-scheduled-event --action 'lambda:InvokeFunction' \
--principal events.amazonaws.com
Add the function to the EventBridge rule, by creating a
targets.json
file containing a memorable, unique string, and the ARN of the Lambda Function:[
{
"Id": "docker_lambda_trigger",
"Arn": "<ARN_LAMBDA_FUNCTION>"
}
]
important
If you get an error saying Parameter ScheduleExpression is not valid
, you might have made a mistake in the cron expression. Check the cron expression examples documentation.