Cloud Kotta

Recipes for a turnkey versatile cloud execution service. This is in production at https://turingcompute.net/ Some of the auxiliary systems are not included in the cloudformation document in the infrastructure folder and must be setup manually.

Use at your own risk

Setup

Install boto, preferably inside virtualenv for testing.

pip install boto

Update configs with user specific info.
Run source setup.sh
Run ./aws.py with no args to get a help message on supported operations.

Broker the best deals from Amazon

Amazon Documentation

From the gospel of Amazon's documentation : If your Spot Instance is interrupted by Amazon EC2, you will not be charged for the interrupted hour. For example, if your Spot Instance is interrupted 59 minutes after it starts, we will not charge you for that 59 minutes. However, if you terminate your instance, you will pay for any partial hour of usage as you would for On-Demand Instances.

While several strategies are described in the internets for ensuring high availability at a lower cost, I have yet to come across strategies for applications for which availability is desirable but not a requirement. This might be because:

Stategies are trivial when availability is not a concern.
Very few applications/use-cases can work with low availability.

Since our primary concern is getting massive computational power at the cheapest price point with less regard for high-availability and flexibility on time to completion, we can afford to :

Compute only when the price is within limits.
Engage in methods that involve higher risk of termination for cheaper compute.

The following bits need a lot more thinking:

Since we have timing considerations coming from hourly billing, a stateful service would most likely be needed. If we were to maintain a pool of resources, we'd want to have control on when to terminate resources, and we'd want to terminate them as close to the hourly billing mark as possible. Holding a resource to the billing point, increases the odds of termination as well as extracts the most compute time from the instance. At some point we might want some way of biasing new tasks/jobs to fill available compute time slots on existing resources.

Track the remaining compute time per instance, and fit incoming tasks to the instance with the best fit ? -Do we do best fit or greedy fit ? We atleast have the advantage of knowing a walltime for the apps.
- This could be a bottomless rabbit hole. Almost like writing a scheduler from scratch.

The desired time to completion dictates the volume of instances/cores required and walltime per core. Deadline <= nCores*totalWallTime

Here are the rules:

bidPrice determines the probability of eviction. -Separate module to determine probability of eviction -Look at probability from history -smarter prediction can come later.

risk_cost_update(self, bidPrice, currentPrice, acceptable_risk)
    If projectedWaitTime > Deadline:
        self.acceptable_risk -= 1
        recompute_price(bidPrice, currentPrice)

engine(bidPrice,acceptable_risk)
    If bidPrice >= currentPrice :
        If instance.state == Running:
            risk_cost_update(self, bidPrice, currentPrice, acceptable_risk)
            check_workQueue(self)
        else:
            risk_


    else-if bidPrice <  currentPrice :
        If instance.state == Running:
             Wait for AWS to kill instance | We get free compute time
        If instance.state == Pending:
             risk_cost_update(self, bidPrice, currentPrice, acceptable_risk)

Notes from meeting with Mike:

The only reasonable predictions that could be made can be made from looking at daily and weekly price patterns.

Name		Name	Last commit message	Last commit date
Latest commit History 251 Commits
REST_client		REST_client
infrastructure		infrastructure
queue_watcher		queue_watcher
templates		templates
test_jobs		test_jobs
theWhip		theWhip
utils		utils
web2		web2
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
add_user.py		add_user.py
applications.py		applications.py
command.py		command.py
config_manager.py		config_manager.py
configurator.py		configurator.py
delete_items_by_jobname.py		delete_items_by_jobname.py
download.py		download.py
dynamo_utils.py		dynamo_utils.py
job_definitions.py		job_definitions.py
job_runner.py		job_runner.py
production.conf		production.conf
requirements.txt		requirements.txt
s3_access.md		s3_access.md
s3_utils.py		s3_utils.py
seppukku.py		seppukku.py
ses_utils.py		ses_utils.py
setup.sh		setup.sh
sns_sqs.py		sns_sqs.py
sts.py		sts.py
submit_task.py		submit_task.py
system_stats.sh		system_stats.sh
task_executor.py		task_executor.py
task_executor_utils.py		task_executor_utils.py
test.conf		test.conf
test.py		test.py

License

tskluzac/cloud_kotta

Folders and files

Latest commit

History

Repository files navigation

Cloud Kotta

Setup

About

Resources

License

Stars

Watchers

Forks

Languages