Skip to content
forked from nasa/harmony

Application for providing services for Earth observation data in the cloud using standards-based APIs

License

Notifications You must be signed in to change notification settings

lauro-cesar/harmony

 
 

Repository files navigation

Harmony

Services. Together.

Harmony has two fundamental goals in life:

  1. Services - Increase usage and ease of use of EOSDIS' data, especially focusing on opportunities made possible now that data from multiple DAACs reside in AWS. Users should be able to work seamlessly across data from different DAACs in ways previously unachievable.
  2. Together - Transform how we, as a development community, work together to accomplish goal number 1. Let's reuse the simple, but necessary components (e.g. EDL, UMM, CMR and Metrics integration) and let's work together on the stuff that's hard (and fun) like chaining, scaling and cloud optimizations.

For general project information, visit the Harmony wiki. Harmony discussion and collaboration occurs in the EOSDIS #harmony Slack channel.

Table of Contents

  1. Development Prerequisites
    1. Earthdata Login Application Requirement
    2. Software Requirements
  2. Running Harmony
    1. Set Up Environment Variables
    2. Run Tests
    3. Set Up A Database
    4. Set Up and Run Argo, Localstack
    5. Add A Service Backend
    6. Run Harmony
    7. Connect A Client
  3. Local Development Of Workflows Using Visual Studio Code
  4. Running in AWS
  5. Contributing to Harmony
  6. Additional Resources

Development Prerequisites

For developing Harmony on Windows follow this document as well as the information in docs/dev_container/README.md.

Earthdata Login Application Requirement

To use Earthdata Login with a locally running Harmomy, you must first set up a new application in the Earthdata Login UAT environment using the Earthdata Login UI. https://wiki.earthdata.nasa.gov/display/EL/How+To+Register+An+Application. This is a four step process:

  1. Request and receive permission to be an Application Creator
  2. Create a local/dev Harmony Application in the EDL web interface
  3. Add the necessary Required Application Group
  4. Update .env with credentials

You must select "401" as the application type for Harmony to work correctly with command line clients and clients like QGIS. You will also need to add the "eosdis_enterprise" group to the list of required application groups in order for CMR searches issued by Harmony to be able to use your Earthdata Login tokens. Update OAUTH_CLIENT_ID and OAUTH_PASSWORD in .env with the information from your Earthdata Login application. Additional information including other OAUTH values to use when creating the application can be found in the example/dotenv file in this repository.

Software Requirements

Required:

  • A local copy of this repository. Using git clone is strongly recommended
  • Node.js version 12. We strongly recommend installing NVM to add and manage node versions.
  • Mac OSX, Linux, or similar command line tooling. Harmony is tested to run on OSX >= 10.14 and Amazon Linux 2. Command-line instructions and bash helper files under bin/ are tested on OSX >= 10.14.
  • git - Used to clone this repository
  • A running Docker Desktop or daemon instance - Used to invoke docker-based services
  • Docker compose version 1.20.0 or greater; preferably the latest version, which is v1.26 or greater.
  • The AWS CLI - Used to interact with both localstack and real AWS accounts
  • SQLite3 commandline - Used to create the local development and test databases. Install using your OS package manager, or download precompiled binaries from SQLite
  • PostgreSQL (required by the pg-native library) - brew install postgresql on OSX
  • Earthdata Login application in UAT (Details below in the 'Set up Earthdata Login application for your local Harmony instance' section)
  • kubectl - A command-line application for interfacing with a Kubenetes API.

Highly Recommended:

  • An Amazon Web Services account - Used for testing Harmony against object stores and running Harmony in AWS
  • An editor with syntax awareness of modern Javascript. If you do not have this or any preference, consider Visual Studio Code

Optional:

  • awscli-local - CLI helpers for interacting with localstack
  • Python version 3.7 - Useful for locally running and testing harmony-docker and other backend services

Running Harmony

Set up Environment

If you have not yet cloned the Harmony repository, run

$ git clone https://github.com/nasa/harmony.git

Ensure node is available and is the correct version, 12.x.y, where "x" >= 14.

$ node --version
v12.22.1

If it is not the correct version and you are using NVM, install it and ensure your PATH is up-to-date by running:

$ nvm install && nvm use && node --version
...
<NVM output>
...
v12.22.1

Be sure to verify the version on the final line to make sure the NVM binary appears first in your PATH.

From the harmony project root, install library dependencies:

$ npm install

Recommended: Add ./node_modules/.bin to your PATH. This will allow you to run binaries from installed node modules. If you choose not to do this, you will need to prefix node module calls with npx, e.g. npx mocha instead of just mocha

Set Up Environment Variables

Read the ENV_CHANGELOG.md file to see what environment variables have been added, dropped, or changed. Copy the file example/dotenv to a file named .env in the root project directory. Follow the instructions in that file to populate any blank variables. Variables that have values in the example can be kept as-is, as they provide good defaults for local development. To check environment differences between the example and local, run:

$ git diff --no-index .env example/dotenv

We recommend doing this any time you receive an example/dotenv update to ensure there are no new variables needed.

Run Tests

To run the linter, tests, and coverage checks as the CI environment will, run

$ npm test

Harmony uses eslint as a linter, which can be invoked as $ npx eslint (or $ eslint if you have set up your PATH). It uses mocha for tests, $ npx mocha, and nyc for code coverage, $ npx nyc mocha.

Test Fixtures

Rather than repeatedly perform the same queries against the CMR, our test suite uses node-replay to record and play back HTTP interactions. All non-localhost interactions are recorded and placed in files in the fixtures directory.

By default, the test suite will playback interactions it has already seen and record any new interactions to new files. This behavior can be changed by setting the REPLAY environment variable, as described in the node-replay README.

To re-record everything, remove the fixtures directory and run the test suite. This should be done to cull the recordings when a code change makes many of them obsolete, when CMR adds response fields that Harmony needs to make use of, and periodically to ensure no impactful CMR changes or regressions.

Set Up A Database

To setup a sqlite3 database with the correct schema for local execution, run

$ bin/create-database development

This should be run any time the versioned contents of the db/db.sql file change.

This will create a file, db/development.sqlite3, which will contain your local data. You can delete the above file to remove all existing development data.

In production environments, we use PostgreSQL and use database migrations to modify the schema. If you have a PostgreSQL database, you can create and/or migrate your database by setting NODE_ENV=production and DATABASE_URL=postgresql://your-postgres-connection-url and running:

$ npx knex --cwd db migrate:latest

Set Up and Run Argo, Localstack

Harmony uses Argo Workflows to manage job executions. In development, we use Localstack to avoid allocating AWS resources.

Prerequisites

  • Mac:
    • Install [Docker Desktop] https://www.docker.com/products/docker-desktop. Docker Desktop comes bundled with Kubernetes and kubectl. If you encounter issues running kubectl commands, first make sure you are running the version bunedled with Docker Desktop.
    • Run Kubernetes in Docker Desktop by selecting Preferences -> Kubernetes -> Enable Kubernetes
    • Install the Argo CLI, the command line interface to Argo
  • Linux / Generic:
    • Install minikube, a single-node kubernetes cluster useful for local development
    • Install kubectl, a command line interface to kubernetes.
    • Install the Argo CLI, the command line interface to Argo

Installing and running Argo and Localstack on Kubernetes

$ ./bin/start-argo

This will install Argo and forward port 2746 to localhost. It will take a few minutes the first time you run it. You will know when it has completed when it prints

Handling connection for 2746

You can then connect to the Argo Server UI at `http://localhost:2746'.

You can change the startup port by adding the -p option like so for port 8080:

$ ./bin/start-argo -p 8080

minikube will default to using the docker driver. You can change the driver used by minikube by using the -d option with start-argo like so

$ ./bin/start-argo -d DRIVER

where DRIVER is one of the supported VM drivers found here.

Deleting applications and stopping Kubernetes

To delete the argo and localstack deployment, run:

$ kubectl delete namespaces argo

minikube users can stop Kubernetes by pressing ctrl-C on the bin/start-argo process or run minikube stop. Docker Desktop users will need to close Docker or disable Kubernetes support in the UI. Note that the latter uninstalls kubectl.

(minikube only) Configuring the callback URL for backend services

You can skip this step if you are using the default docker driver for minikube and set CALLBACK_URL_ROOT as described in the example dotenv file. If you are using a different driver such as virtualbox you may need to execute the following command to get the IP address minikube has bridged to localhost:

minikube ssh grep host.minikube.internal /etc/hosts | cut -f1

This should print out an IP address. Use this in your .env file to specify the CALLBACK_URL_ROOT value, e.g., CALLBACK_URL_ROOT=http://192.168.65.2:4001.

Add A Service Backend

Clone the Harmony GDAL service repository into a peer directory of the main Harmony repo

$ cd ..
$ git clone https://github.com/nasa/harmony-gdal.git

(minikube only) From the harmony-gdal project root, run

eval $(minikube docker-env)

This will set up the proper environment for building the image so that it may be used in minikube and Argo. Next run the following command to build and locally install the image:

./bin/build-image

This may take some time, but ultimately it will produce a local docker image tagged harmony/gdal:latest. You may choose to use another service appropriate to your collection if you have adapted it to run in Harmony.

Run Harmony

To run Harmony locally such that it reloads when files change (recommended during development), run

$ npm run start-dev

In production, we use $ npm run start which does the same but does not add the file watching and reloading behavior.

You should see messages about the two applications listening on two ports, "frontend" and "backend." The frontend application receives requests from users, while the backend application receives callbacks from services.

Connect A Client

You should now be able to view the outputs of performing a simple transformation request. Harmony has its own test collection set up for sanity checking harmony with the harmony-gdal backend. This will fetch a granule from that collection converted to GeoTIFF: http://localhost:3000/C1233800302-EEDTEST/ogc-api-coverages/1.0.0/collections/all/coverage/rangeset?granuleId=G1233800343-EEDTEST

You can also set up a WMS connection in QGIS, for example, by placing the http://localhost:3000/C1233800302-EEDTEST/wms as the "URL" field input in the "Connection Details" dialog when adding a new WMS connection. Thereafter, expanding the connection should provide a list of layers obtained through a GetCapabilities call to the test server, and double-clicking a layer should add it to a map, making a WMS call to retrieve an appropriate PNG from the test server.

You can also use the Argo dashboard at http://localhost:2746 to visualize the workflows that were kicked off from your Harmony transformation requests.

Local Development Of Workflows Using Visual Studio Code

This section describes a VS Code based approach to local development. The general ideas are, however, applicable to other editors.

There are two components to local development. The first is mounting your local project directory to a pod in a workflow so that changes to your code are automatically picked up whenever you run the workflow. The second is attaching a debugger to code running in a pod container (unless you prefer the print-debug method, in which case you can use the logs).

Prerequisites

Mounting a local directory to a pod running in a workflow

This is accomplished in two steps. The first step is to mount a local directory to a node in your kubernetes/minikube cluster. On a mac using the virtualbox driver the /Users directory is automatically mounted as /Uses on the single node in minikube. On Linux using the virtualboxdriver the /home directory is automatically mounted at /hosthome. Other options for mounting a local directory can be found here.

The second step is to mount the directory on the node to a directory on the pod in your workflow. This can be done using a hostPath volume defined in your workflow template. The following snippet creates a volume using the /Users/username/project_folder directory from the node on which the pod runs, not directory from the local filesystem. Again, on a mac using virtualbox the local /Users folder is conveniently mounted to the /Users folder on the node.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  volumes:
  - name: test-volume
    hostPath:
      path: /Users/username/project_folder
  ...

You can then mount the volume in your pod using a volumeMounts entry in you container configuration:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  volumes:
  - name: test-volume
    hostPath:
      path: /Users/username/project_folder
  entrypoint: hello
  arguments:
    parameters:
    - name: message
      value: James
  templates:
  - name: hello
    inputs:
      parameters:
      - name: message
    container:
      image: node
      volumeMounts:
      - mountPath: /test-mount
        name: test-volume

Now the pod will be able to access local code directly in the /test-mount directory. Updates to code in the developers local project will immediately show up in workflows.

Attaching a debugger to a running workflow

Argo Workflow steps run as kubernetes jobs, which means that the containers that run them are short-lived. This complicates the process of attaching a debugger to them somewhat. In order to attach the debugger to code running in a container in a workflow you have to start the code in a manner that will pause the code on the first line when it runs and wait for a debugger to attach.

For NodeJS code this is easily done by passing the --inspect-brk option to the node command. workflow template building on our previous example is given here

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  volumes:
  - name: test-volume
    hostPath:
      path: /Users/username/project_folder
  entrypoint: hello
  arguments:
    parameters:
    - name: message
      value: James
  templates:
  - name: hello
    inputs:
      parameters:
      - name: message
    container:
      image: node
      volumeMounts:
      - mountPath: /test-mount
        name: test-volume
      command: [node]
      args: ["--inspect-brk", "/test-mount/index.js", "{{inputs.parameters.message}}"]

In this example the starting point for the step is in the index.js file.

Similar approaches are available for Python and Java, although they might require changes to the code.

Once you launch your workflow it will pause at the step (wait for the icon in the UI to change from yellow to blue and spinning), and you can attach the debugger. For VS Code this is easily done using the kubernetes plugin.

Open the plugin by clicking on the kubernetes icon in the left sidebar. Expend the CLUSTERS tree to show the pods in CLUSTERS>minikube>Nodes>minikube then ctrl+click on the pod with the same name as the step in your workflow, e.g., hello-world-9th8k (you may need to refresh the view). Select Debug (Attach) from the menu, then selecting the wait container (not main), and select the runtime environment (java, nodejs, or python).

At this point the editor should open the file that is the starting point for your applications and it should be stopped on the first line of code to be run. You can then perform all the usual debugging operations such as stepping trough code and examining variables.

Running in AWS

Note: It is currently easiest to allow the CI/CD service to deploy the service remotely; it is deployed to the sandbox after each merge to master. As the deployment simply uploads the code, sets environment variables, kills the old server and runs $ npm run start, at present, there is not typically much to be gained by running remotely during development.

When setting up a new environment, the first two steps need to be performed, but the CI environment should be set up to run the deployment rather than having it done manually.

Prerequisites

  • Once per account, run $ bin/account-setup to create a service linked role for ECS.
  • Upload the harmony/gdal Docker image somewhere accessible to an EC2 deployment. This should be done any time the image changes. The easiest way is to create an ECR in your account and push the image there. Running $ bin/build-image && bin/push-image from the harmony-gdal repository will perform this step..

Stop here and set up CI/CD

Deploying the code should be done using the harmony-ci-cd project from Bamboo rather than manually. Apart from that project and CI/CD setup, we do not yet have automation scripts for (re)deploying to AWS manually, as it is typically not needed during development.

Deploy the code to AWS

Note: The harmony-ci-cd repository contains automation code to do the following, usable from Bamboo. You may use it locally by setting all relevant environment variables in a .env file, running $ bin/build-image in the root directory of the harmony-ci-cd project, and then running the harmony-ci-cd bin/deploy script from your harmony codebase's root directory.

  1. scp the Harmony codebase to the remote instance
  2. ssh into the remote instance
  3. Run $ $(aws ecr get-login --region=$AWS_DEFAULT_REGION --no-include-email) where AWS_DEFAULT_REGION is the region containing your harmony-gdal ECR instance. Skip this step if harmony-gdal is not in an ECR.
  4. Run $ if pgrep node; then pkill node; fi to stop any existing server that may be running
  5. Run $ nohup npm start >> ../server.log 2>&1 & to start harmony
  6. Run $ docker pull $GDAL_IMAGE to fetch harmony-gdal changes, where GDAL_IMAGE is the EC2-accessible location of your harmony-gdal Docker image. Repeat for any other docker images you want to use.

Connecting a client to an AWS instance

This process is identical to "Connect a client" above, except instead of http://localhost:3000, the protocol and host should be that of your load balancer, e.g. https://your-load-balancer-name.us-west-2.elb.amazonaws.com. Retrieve the precise load balancer details from the AWS console.

Updating development resources after pulling new code

Once up and running, if you update code, you can ensure dependencies are correct, Argo is deployed, and necessary Docker images are built by running

$ npm run update-dev

Contributing to Harmony

We welcome Pull Requests from developers not on the Harmony team. Please follow the standard "Fork and Pull Request" workflow shown below.

Submitting a Pull Request

If you are a developer on another team and would like to submit a Pull Request to this repo:

  1. Create a fork of the harmony repository.
  2. In the fork repo's permissions, add the edc_snyk_user with Read access
  3. In the #harmony-service-providers Slack channel, ask a Harmony team member to import your fork repo into Snyk (see below).
  4. When ready, submit a PR from the fork's branch back to the harmony master branch. Ideally name the PR with a Jira ticket name (e.g., HARMONY-314)
  5. The PR's 'build' tab should not show errors

Importing a Fork Repo Into Snyk

To run Snyk on a fork of the repo (see above), the developer's fork needs to be imported into Snyk:

  1. Open Snyk
  2. Click Integrations on the navbar at the top of the page
  3. Click the integration type based on where the repo is hosted. E.g.: Bitbucket Server, GitHub, etc.
  4. Search for 'harmony' using the search box
  5. Click the checkbox on the developer's newly-created fork repo
  6. Click the 'Import selected repositories' button

This import should be done before the developer submits a PR. If it hasn't, the PR 'build' will fail and the PR will be blocked. In this situation, the project can still be imported into Snyk, but then the PR will need to be declined and resubmitted.

Additional Resources

About

Application for providing services for Earth observation data in the cloud using standards-based APIs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 51.9%
  • HTML 45.6%
  • Python 1.0%
  • Shell 0.9%
  • JavaScript 0.4%
  • Jupyter Notebook 0.2%