spot instance manager implemented in python3
The spot-manager manages the lifecycle of the spot instances. Its duty is to respond to SPOT_INTERRUPTION notifications, and request a new spot, hiding the complexity from the user.
- spot-manager (listens infinitely to SQS queue and handle messages)
- web-server (currently only notify the spot-manager for a new registration)
- spot-image-creator (creates daily ami from spot and notify SQS *** NOT IMPLEMENTED YET ***)
*** NOTE: web-server is the only way to add a managed instance
- user request to add instance lms-chen to web-server 1a. web server publishes SQS message for registration 1b. spot-manager poller receive the message and add lms-chen to a local db (its now managed)
- spot-manager requests a new spot (sir) and updates the sir for lms-chen
- sir is fulfilled by amazon
- new spot launched the iid returned is updated for lms-chen
- spot-manager updates its local db with new iid
- the spot is running cloud-init code (once) for app reconfiguration (tomcat) with the new private ip address
- the spot publish an SQS message with the new public_ip and reboots
- spot-manager receives the new spot message from SQS and changes the ns_record (route53) to refer to the new public ip address
- user can now ping lms-chen.lms.lumosglobal.com NOTE TESTED YET:
- aws will notify SQS with "SPOT INTERRUPTION" 2 minutes before spot will go down
- spot-manager will do again steps 4->8
- vm has self-deploy script
- python 3.6 - 3.8 installed
- cloud-init is working wo errors
- vm has iam cred to send sqs msg
- spot-image-creator
*** NOTE
-
APIGW apache should use private NS to access core devices instead of private IPs
-
zookeeper config server.1=mq1-stg39.lms.lumosglobal.com:2888:3888 server.2=mq2-stg39.lms.lumosglobal.com:2888:3888 server.3=mq3-stg39.lms.lumosglobal.com:2888:3888
-
Kafka connection string should use private NS: zookeeper.connect=mq1-stg39.lms.lumosglobal.com:2181,mq2-stg39.lms.lumosglobal.com:2181,mq3-stg39.lms.lumosglobal.com:2181
-
configure backend (and other core service to use NS for kafka) cp -ar conf/ conf.original grep -R 2181 OLD_ZOO_CS=10.25.1.224:2181,10.25.1.125:2181,10.25.1.28:2181 NEW_ZOO_CS=mq1-stg39.lms.lumosglobal.com:2181,mq2-stg39.lms.lumosglobal.com:2181,mq3-stg39.lms.lumosglobal.com:2181 sed -i "s/${OLD_ZOO_CS}/${NEW_ZOO_CS}/g" backend-messaging.properties
grep -R 9092 OLD_KAFKA_CS=10.25.1.224:9092,10.25.1.125:9092,10.25.1.28:9092 NEW_KAFKA_CS=mq1-stg39.lms.lumosglobal.com:9092,mq2-stg39.lms.lumosglobal.com:9092,mq3-stg39.lms.lumosglobal.com:9092 sed -i "s/${OLD_KAFKA_CS}/${NEW_KAFKA_CS}/g" backend-messaging.properties
NOTE: kafka.rmi.server.hostname uses private ip but is meaningless, so no need to replace
TODO:
- implement spot-image-creator
- implement new-spot-image-created handler to updates the local record (next spot will use new image)