Olivier Truong

Dev Workflow Notes

VSCode debugging with pipenv

First, Cmd+Shift+P → Python: Select Interpreter → Select Any

Observe that a .vscode/settings.json file is automatically created.

Open settings.json file, see python.pythonPath variable.

Use pipenv --venv to find the venv path. It looks like this: /Users/olivier/.local/share/virtualenvs/sync_third_party_streams-diSaDxUs

Append /bin/python to the path and set python.pythonPath to it.

It should finally look like this: "python.pythonPath": "/Users/olivier/.local/share/virtualenvs/sync_third_party_streams-diSaDxUs/bin/python"

AWS RDS MySQL

Problems connecting to database from e.g. mac client?

  • Make sure DB instance setting (go to MODIFY to access) Public accessibliity is set to YES
  • Make sure VPC security group allows inbound traffic from your IP address

SQLAlchemy

Import module error?

Make sure a python client is installed such as PyMySQL. Run pipenv install pymysql and then connect to the DB using mysql+pymysql://

Tip for speeding up mass updates using a temp table:

Speed things up further by creating an index for a non-primary key (like usernames)

AWS Lambda

Allowing Lambda to connect to RDS and public internet at the same time is tricky. Why? Because it's recommended to use VPC security groups to limit traffic to RDS from known sources (like my IP address). So in this scenario you would have Lambda and RDS both configured to the same VPC. This is problematic because Lambda can't access the public internet when inside a VPC, it can only talk to RDS and other services inside the VPC. To fix this, you're supposed to setup a NAT gateway but this costs $35/month. That's too expensive this early in the development stage. So to allow Lambda to access both RDS and the public internet (in order to hit external APIs), the easiest solution is to update the VPC security group (the one RDS is configured to) to allow all inbound traffic. This allows Lambda to talk to RDS. Then remove VPC from the Lambda configuration which allows it to talk to anything and everything including the public internet.

However, if you were to use VPC with a NAT gateway, this is the guide to follow: https://gist.github.com/reggi/dc5f2620b7b4f515e68e46255ac042a7

My addendum to that guide is the following:

  • Create the NAT gateway first (the guide lists instructions at the bottom instead of the top)

Python-lambda library

  • Make sure you call lambda deploy with the --preserve-vpc flag to prevent your vpc settings from being reset
  • Double check the settings in config.yaml e.g. timeout
  • Problems? Try resetting pipenv: pipenv --rm && pipenv --python 3.7 && pipenv install && pipenv run pip install python-lambda && pipenv run lambda deploy --local-package /Users/olivier/dev/live/streamdl

SQL

  • Emptying a table is called "truncate"ing it.

Elastic Beanstalk

How to install FFMPEG on EB: https://stackoverflow.com/questions/39241654/how-to-install-ffmpeg-on-elastic-beanstalk

Unix

How to find all processes by name?ps -aux | grep ffmpeg

How to kill all processes by name? sudo pkill -f ffmpeg

Logging in Python

import logging

# Optional
logging.basicConfig(filename='/opt/python/log/flask.log', level=logging.DEBUG)

logging.info()

Celery

Problem starting the celery worker?

Given this file directory structure ~/dev/live/flask-worker/worker/celery.py

This is the command celery -A worker.celery:app -Q network,celery -O fair worker -l info

  • Important to note that this command looks for a celery.py file inside flask-worker. So make sure you run that command from the right directory (above celery.py)
  • -Q is optional. -O fair is recommended.

Error: Unrecoverable error: ImportError('The curl client requires the pycurl library.')

Running python -c 'import pycurl should say ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)

  • Fix
    • pip uninstall pycurl
    • brew install openssl
    • export CPPFLAGS="-I/usr/local/opt/openssl@1.1/include"
    • export LDFLAGS="-L/usr/local/opt/openssl@1.1/lib"
    • pip install --compile --install-option="--with-openssl" pycurl

Problems installing Celery or Pycurl because of this error

Could not run curl-config

Fix: add this to .ebextensions config

packages:
  yum:
    libcurl-devel: []

Daemonizing Celery on AWS Elastic Beanstalk using supervisord

In .ebextensions config, add:

container_commands:
  01-celery:
    command: "chmod +x .ebextensions/deploy.sh && .ebextensions/deploy.sh"

Create [deploy.sh](http://deploy.sh):

#!/usr/bin/env bash
set -e

SCRIPT_PATH=`dirname $0`

# Copy + chmod + chown
# copy_ext source target 0755 user:group
copy_ext() {
    #cp + chmod + chown
    local source=$1
    local target=$2
    local permission=$3
    local user=$4
    local group=$5
    if ! cp $source $target; then
        error_exit "Can not copy ${source} to ${target}"
    fi
    if ! chmod -R $permission $target; then
        error_exit "Can not do chmod ${permission} for ${target}"
    fi
    if ! chown $user:$group $target; then
        error_exit "Can not do chown ${user}:${group} for ${target}"
    fi
    echo "cp_ext: ${source} -> ${target} chmod ${permission} & chown ${user}:${group}"
}

script_add_line() {
    local target_file=$1
    local check_text=$2
    local add_text=$3

    if grep -q "$check_text" "$target_file"
    then
        echo "Modification ${check_text} found in ${target_file}"
    else
        echo ${add_text} >> ${target_file}
        echo "Modification ${add_text} added to ${target_file}"
    fi
}

copy_ext $SCRIPT_PATH/celeryd.conf /opt/python/etc/celeryd.conf 0755 root root

# include celeryd.conf into the supervisord.conf
script_add_line /opt/python/etc/supervisord.conf "include" "[include]"
script_add_line /opt/python/etc/supervisord.conf "celeryd.conf" "files=celeryd.conf "

# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd

Create celeryd.conf:

; ==================================
;  celery worker supervisor example
; ==================================

[program:celery]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A worker.celery:app --loglevel=INFO

directory=/opt/python/current/app
user=wsgi
numprocs=1
stdout_logfile=/opt/python/log/celery-worker.log
stderr_logfile=/opt/python/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true

; Set Celery priority higher than default (999)
; so, if rabbitmq is supervised, it will start first.
priority=1000

Supervisord

sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf reload

sudo /usr/local/bin/supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd

AWS ECS + Docker

Getting started

  • Install ECS CLI

    • sudo curl -o /usr/local/bin/ecs-cli [https://amazon-ecs-cli.s3.amazonaws.com/ecs-cli-darwin-amd64-latest](https://amazon-ecs-cli.s3.amazonaws.com/ecs-cli-darwin-amd64-latest)
    • sudo chmod +x /usr/local/bin/ecs-cli
    • ecs-cli configure profile --profile-name olivier --access-key <key> --secret-key <key>
    • ecs-cli configure --cluster live-workers --default-launch-type EC2 --region us-east-1 --config-name
  • Create EC2 key pair

    • name it olivier-keypair-useast1
    • save it to ~/.ssh
  • Create ECS cluster

    • ecs-cli up --keypair olivier-keypair-useast1 --capability-iam --size 1 --instance-type t2.micro
      • Add AmazonECS_FullAccess policy to IAM user
      • If using t3 instance type, remember to disable UNLIMITED to avoid surprises in the bill
  • use docker-compose.yml for local dev

    # docker-compose.yml
    
    version: "3"
    services:
      worker1:
        build: .
        tmpfs: /tmp
        volumes:
          - .:/opt/app # mirrors *local* current dir to *container* /opt/app
        environment:
          - CELERY_ENV=dev
          - CELERY_PROJECT_NAME=live-dev
      worker2:
        build: .
        volumes:
          - .:/opt/app
        tmpfs: /tmp
        environment:
          - CELERY_ENV=dev
          - CELERY_PROJECT_NAME=live-dev
        command: ["-Q", "network"] # extends ENTRYPOINT defined in the Dockerfile
    
  • Dev workflow

    • Local
      • docker-compose up --build or docker-compose restart
      • docker-compose exec worker1 <command>
    • On EC2
      • First SSH to the instance (instructions somewhere on this page)
      • docker ps
      • docker exec <container ID> <command>
      • docker stats is also nice
  • If you use build: . and volumes: - .:/opt/app and your Dockerfile uses /opt/app, you can speed up testing your code changes by doing docker-compose restart instead of re-building with docker-compose up --build

  • use ecs-compose.yml (name is arbitrary) for pushing to ECS

    • docker build -t o/live-workers .
    • docker push o/live-workers
    • ecs-cli compose --file ecs-compose.yml up
    # ecs-compose.yml
    
    version: "3"
    services:
      worker:
        image: olivierdecaen/live-workers
        environment:
          - CELERY_ENV=dev
          - CELERY_PROJECT_NAME=live-dev
          - FFMPEG_BIN_PATH=ffmpeg
        deploy:
          resources:
            limits:
              cpus: "1.0"
              memory: 500M
    
    • You can specific ECS specific parameters with an ecs-params.yml file
      • You can use --ecs-params like so: ecs-cli compose --file ecs-compose.yml --ecs-params ecs-params.yml up
    version: 1
    
    task_definition:
      ecs_network_mode: "awsvpc"
      services:
        worker1:
          essential: true
        worker2:
        worker3:
        worker4:
        redis:
    run_params:
      network_configuration:
        awsvpc_configuration:
          subnets:
            - subnet-05d06c7f28bcdcf21
            - subnet-0ee123d5f856fe170
          security_groups:
            - sg-0dfca200d00ba01bb
    
  • Logging

    • Send logs to AWS Cloudwatch. See logging config below
      • (Make sure to run ecs-cli compose --file ecs-compose.yml up
      • Now you can see logs at Cloudwatch console → Log → Log groups
    # ecs-compose.yml
    
    version: "2"
    services:
      worker1:
        image: olivierdecaen/live-workers
        environment:
          - AWS_ACCESS_KEY_ID=
          - AWS_SECRET_ACCESS_KEY=
          - CELERY_ENV=prod
          - CELERY_PROJECT_NAME=live
        mem_limit: "100m"
        logging:
          driver: awslogs
          options:
            awslogs-region: us-east-1
            awslogs-create-group: true
            awslogs-group: live-workers
    
    • Errors
      • CannotStartContainerError: Error response from daemon: failed to initialize logging driver: failed to create Cloudwatch log group: AccessDeniedException: User: arn:aws:sts::250793284322:assumed-role/amazon-ecs-cli-setup-live-workers-EcsInstanceRole-1OR3A
        • IAM → Role → amazon-ecs-cli-setup-live-workers-EcsInstanceRole-1OR3A → Attach permission that has the logs:CreateLogGroup policy.

How to SSH into ECS EC2 instance

  • Open up port 22
    • EC2 Instance Detail → Select the Security Group → Inbound tab → Edit → Add Rule → SSH
  • ssh -i ~/.ssh/olivier-keypair-useast1.pem[ec2-user@ec2-3-82-129-237.compute-1.amazonaws.com](mailto:ec2-user@ec2-3-82-129-237.compute-1.amazonaws.com)

ECS networking bridge between services

To add a redis service, do the following:

# docker-compose.yml

services:
    .. # N-1 services
    redis:
        image: "redis:6.0-rc-alpine"
# ecs-params.yml

version: 1

task_definition:
  ecs_network_mode: "awsvpc"
  services:
    worker1:
      essential: true
    worker2:
    worker3:
    worker4:
    redis:
run_params:
  network_configuration:
    awsvpc_configuration:
      subnets:
        - subnet-057bc915a501017d5 # use subnets that share the same VPC with the EC2 instance
        - subnet-0ec47e266614ed647
      security_groups:
        - sg-0f0ef66080ff23778 # same here, same VPC

AWS IAM permissions

  • Policy not working? Double check the "Allow" config.

AWS SSH

  • Make sure your .pem keypair permissions are set correctly using chmod 400 ~/.ssh/olivier-keypair-useast1.pem or you'll get this error when using it to SSH auth: "Permissions 0644 for '/Users/olivier/.ssh/olivier-keypair-useast1.pem' are too open. It is required that your private key files are NOT accessible by others."

Installing Ffmpeg inside Docker container

  • Download static build from https://johnvansickle.com/ffmpeg and put it inside the Dockerfile root directory. Don't use wget to download it at build time because the website links to a release version that might get updated without you knowing it (and could break things).
# Dockerfile

RUN mkdir /opt/ffmpeg
RUN tar xvf ffmpeg-4.2.2-amd64-static.tar.xz -C /opt/ffmpeg
RUN ln -sf /opt/ffmpeg/ffmpeg-4.2.2-amd64-static/ffmpeg /usr/local/bin/ffmpeg

Redis

Make sure to use decode_responses=True to automatically decode redis results. Redis will otherwise return byte strings.

cache = redis.Redis(host='redis', port=6379, decode_responses=True)

ECS configure private registry so that you can make your images private on Docker Hub

Follow the instructions on https://docs.aws.amazon.com/AmazonECS/latest/developerguide/private-auth-container-instances.html

  • sudo vi /etc/ecs/ecs.config
ECS_ENGINE_AUTH_TYPE=docker
ECS_ENGINE_AUTH_DATA={"https://index.docker.io/v1/":{"username":"","password":"","email":"olivierxtruong@gmail.com"}}
  • sudo systemctl stop ecs
  • sudo systemctl restart ecs
  • verify ecs agent is running: curl [http://localhost:51678/v1/metadata](http://localhost:51678/v1/metadata)

Start interactive shell inside docker container

  • docker exec -it <your container id> /bin/bash -l

Installing streamlink local package

  • Run cd streamlink && pipenv run pip install .
  • No module found error? Make sure you run pip install and not setup.py install

Maintaining a local python package and installing it for other projects

File structure for streamdl package

streamdl/
- setup.py
- streamdl/
    - __init__.py
    - streamdl.py
  1. Create setup.py
import setuptools

setuptools.setup(
    name="streamdl",
    version="0.0.1",
    author="Example Author",
    author_email="author@example.com",
    description="A small example package",
    long_description="",
    long_description_content_type="text/markdown",
    url="https://github.com/pypa/sampleproject",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.6',
)
  1. Run pipenv run python setup.py sdist bdist_wheel

  2. Install in project by running pipenv run pip install -t /Users/olivier/dev/live/lambdas/sync_third_party_streams .

  3. You're responsible for pip installing streamdl's dependencies in your project's pipenv. So next step is to run pipenv install m3u8 and for any other dep inside sync_third_party_streams

Strip metadata from video

ffmpeg -i /Users/olivier/Downloads/ios-app-demo.mp4 -map_metadata -1 -c:v copy -c:a copy /Users/olivier/Downloads/ios-app-demo-no-metadata.mp4

Deployment checklist

  • Run npm run build inside reactapp
    • postbuild script defined inside reactapp/package.json is responsible for copying the build to server/client/.
  • Run gcloud app deploy --project projectlive
  • Finished!

Google Cloud Storage

Creating a bucket

gsutil mb -p projectlive gs://projectlive

Adding files to bucket

gsutil -m rsync -rnd -x '.*/\..*|^\..*' /Users/olivier/dev/live/web/reactapp/ios_splash_images gs://projectlive/splash

  • -m flag is for running operations in parallel
  • -r is for recurse
  • -d is for mirroring the source and destination
    • WARNING: if you make a mistake when specifying the directories, can accidentally lose a bunch of files.
      • Use -n to do a test run without performing any actual operations
  • -x '.*/\..*|^\..*' is to ignore hidden files like .DS_Store

Google Chrome

Allow invalid certificates over localhost (for connecting to localhost over HTTPS)

chrome://flags/#allow-insecure-localhost

Source: https://superuser.com/questions/772762/how-can-i-disable-security-checks-for-localhost