Skip to content

cannlytics/cannabis-data-science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Welcome to the Cannabis Data Science Meetup Group, a team of data scientists from around the world who are advancing cannabis science, one ๐Ÿงฌ molecule at a time! Here you can find many useful notes, notebooks, and video tutorials to help you get, wrangle, and analyze cannabis data with the best of them. Come join the fun every Wednesday at 1:20pm PST / 2:20pm MT / 3:20pm CT / 4:20pm EST. You are always welcome to use the code, watch the videos, and make contributions of your own! Please dive in:

๐Ÿง‘โ€๐Ÿš€ Meetups

We would love to see you at an upcoming meetup! Please bring any research that you may like to share, your thoughts, comments, questions, concerns, or anything at all. All are welcome. See you soon!

Event Day Time
Cannabis Data Science Wednesdays 1:20pm PST / 2:20pm MT / 3:20pm CDT / 4:20pm EST

Please peruse the Cannabis Data Science archive and see if you can find anything of value!

Topic Description Video Code
Get Data Join the fun, zany bunch on our first Cannabis Data Science meetup as we begin to wrangle the firehose of data that the Washington State traceability system offers to the public. Video Code
Look at the Data This week we begin to look at the firehose of data that the Washington State traceability system offers to the public. Video Code
Data Wrangling This week we begin to wrangle the firehose of data that the Washington State traceability system offers to the public. Video Code
API Exploration This week we build a simple API to access the firehose of data that the Washington State traceability system offers to the public. Video Code
Competitive Wages This week we estimate competitive wage rates for workers in the Colorado cannabis market with data that the Colorado Marijuana Enforcement Division publishes. Video Code
Competitive Interest Rates This week we estimate competitive interest rates in the Colorado cannabis market using data published by the Colorado Marijuana Enforcement Division. Video Code
Market Concentration This week we begin to estimate market concentration using data that the Washington State traceability system offers to the public. Video Code
Introduction to Forecasting This week we go over the 10 commandments of forecasting and begin to forecast. Video Code
Traceability and Communication This week we begin to talk traceability and discuss how communication is critical when integrating software systems. Video Code
Inflation Part One This week we estimate inflation in the Oregon cannabis market and begin to make forecasts for inflation in 2021. Video Code
Inflation Part Two This week we review our model for inflation in Oregon and go over our forecasts for inflation in 2021. Video Code
Lab Results and Traceability This week we go over the lab testing process and the steps labs take to stay in compliance with traceability systems. Video Code
Waste Analytics This week we discuss and analyze the large amount of biomass waste that is generated by cannabis cultivation. Video Code
Track and Trace This week we begin to wrangle the firehose of data that the Washington State traceability system offers to the public. Video Code
Market Basket Analysis This week we begin to discuss the breakdown of consumer purchases using data that the Washington State traceability system offers to the public. Video Code
Binary Data This week we begin to discuss how to use binary models to analyze cannabis data. Video Code
Crunching Numbers in Oklahoma This week we begin to analyze the data that the Oklahoma Medical Marijuana Authority (OMMA) makes available to the public. Video Code
Better Data, More Forecasts This week we prepare more forecasts for 2021 using public cannabis data. Video Code
Testing and Analysis This week we discuss laboratory testing requirements in the cannabis industry, how lab tests are performed, and how labs operate. Video Code
Laboratory Software This week we talk about software that labs use in their operations. Video Code
Transportation Costs This week we discuss transportation costs and sales in Michigan using data that the CRA makes available to the public. Video Code
Hemp Analysis Part One This week we begin to collect and analyze data from the Midwestern Hemp Database published by the University of Illinois. Video Code
Hemp Analysis Part Two This week we build a simple model to try to predict when hemp may test above the permitted concentration of THC using the Midwestern Hemp Database. Video Code
Cannabinoid Analysis Part One This week we begin to analyze cannabinoid data that the Washington State traceability system offers to the public. Video Code
Cannabinoid Analysis Part Two This week we continue our analysis of the cannabinoid data that the Washington State traceability system offers to the public. Video Code
Residual Solvents This week we discuss residual solvent detections and thresholds using data that Washington State traceability system offers to the public. Video Code
Cannabis Sales Part One We begin cannabis sales analysis with groundbreaking research by Paul Kitko who identifies cannabis dispensary purchase patterns using economics and data science. Join the fun, data wrangling, and analytics in the Cannabis Data Science meetup, every Wednesday at 8:30am PST | 10:30am CDT | 11:30am EST. Support the group: https://opencollective.com/cannlytics-company Find the data and source code: https://github.com/cannlytics/cannabis-data-science Video Code
Cannabis Sales Part Two We continue to analyze cannabis sales, expanding our analysis to all states with permitted recreational and/or medicinal cannabis. Video Code
Looking at Cannabis Types from Las Vegas After a 'snafu' on Wednesday, we manage to analyze 4 types of cannabis in Washington State, from 'Paris' Las Vegas. Video Code
Mapping Licensees per Capita Part One This week we big looking at measure of market competitiveness, licensees per capita, along geographic lines in Oklahoma. Video Code
Barriers to Entry and Market Competitiveness Discussions of market competitiveness and scale led us to a fruitful discussion of barriers to entry, including high capital costs, both financial and human. Video Code
Terpene Analysis Part One Terpene data galore! We discover a treasure trove of public cannabis terpene data published graciously by Connecticut Open Data and calculate the prevalence of various terpenes. Video Code
Terpene Analysis Part Two An extraordinary day of cannabinoid and terpene data crunching followed by data exploration in Massachusetts. Video Code
Measuring Cannabis GDP Today we break new ground by estimating GDP from permitted adult-use cannabis in Massachusetts. Video Code
Equilibrium Analysis Today we conduct a partial equilibrium analysis of the cannabis industry in Massachusetts, estimating prices, wages, and rates of return. Video Code
Model Estimation and Bias We attempt to fit economic models using Massachusetts cannabis data and explore model pitfalls and bias. Video Code
A Brief History of Cannabis QA Albeit impromptu, we manage to discuss the history of quality assurance in the cannabis industry and how it was curiously spurred by the hops latent viroid. Video Code
Forecasting Sales and Inflation Today we apply the 10 commandments of forecasting and utilize a nifty vector autoregressive (VAR) model to forecast cannabis sales in Massachusetts. Video Code
Predicting Market Performance Part One Today we utilize a number of techniques that we have covered to perform a powerful market analysis of Massachusetts' cannabis market and begin to predict market performance in Massachusetts in the coming year, 2022. Video Code
Predicting Market Performance Part Two Today we talk about the history of the structure-conduct-performance paradigm in the industrial organization field of economics and how economic models can be used to analyze regulatory policy, the potential for collusion, and market competition and concentration. Video Code
Predicting Market Performance Part Three We finally complete our market analysis of Massachusetts. We successfully quantify the market, predict its future performance, and discuss the market implications, both past and future. Video Code
Predicting Market Performance Q & A We discuss the market implications of our analysis of Massachusetts and brainstorm ideas for comparative analysis with additional states. Video Code
Comparative Analysis Happy thanksgiving! Today we begin to compare the structure and performance of cannabis dispensaries in various states with adult-use cannabis. We uncover an interesting pattern that warrants further investigation. Video Code
Economic Surplus The Cannabis Data Science meetup group come back strong with an impactful discussion of economic surplus in the cannabis market. Video Code
Measuring Market Structure We see-through our analysis of cannabis markets by concretely measuring market structure. We can now confidently classify the competitiveness of cannabis markets! Video Code
Forecasting Sales in 2022 Join the best meetup to date as we forecast cannabis sales across the U.S. in 2022. It is hard to conceptualize the staggering amount of money spent on cannabis, however, we do just that and concretize the enormous potential social benefit. Video Code
East Coast vs. West Coast Cannabis We do a deep dive on cannabinoids measured in East Coast and West Coast cannabis and find a structural difference that may stem from differences in how the cannabis is tested. Video Code
Forecasting Models Now 300 strong, the Cannabis Data Science meetup group delivers the first open source, open data forecast of cannabis sales in 2022. Video Code
Predicting Laboratory Profitability You're not going to want to miss this meetup, especially if you're a lab owner. This week at the Cannabis Data Science meetup we calculate possibly the most important metric to your bottom line. Whether or not your lab is in business in 5 years from now depends on this metric. Video Code
Processing Cannabinoids and Managing Inconsistencies The lesson of the week: variability matters. We go back in time to discuss the origins of cannabinoid processing, early cannabinoid research, and the development of cannabinoid extraction techniques. Video Code
Data Augmentation and Visualization It is imperative to have the right tools (and data) for the task at hand. The idea is to merge objects by common factors, retaining the data points that you need in your analysis. Once you have augmented data, then you have created value by facilitating analyses that could not otherwise be performed or visualizations that can only be created with the augmented data. Video Code
Statistics with Big Data Calculating statistics on large datasets is difficult, but simple statistics, if able to be calculated, can provide enormous value, provide deep insights, spark ideas for future research, and identify aspects that need further magnification. Video Code
Logistics and Transportation Statistics with Big Data 1,000,000+ more miles this year, easy! Keep on trucking Washington State couriers! This week we look at the total number of transfers by licensee and by license type as we create various novel maps. Check out these stats and more next week with the Cannabis Data Science meetup group. Video Code
Spatial Analysis Today we begin to answer your long-standing questions about cannabis prices. We gather powerful spatial analysis techniques pioneered by great data scientists from throughout history. Stay tuned for as we answer the question: do prices vary by geography (zip code) in Washington State? Video Code
The Effects of Taxes This week we extended our analysis to include taxes! Check out the latest and greatest research on the fundamentals of the cannabis industry. Have any good ideas? Extend the discussion in the comments or on Slack. Video Code
Discussing the State of Cannabis Research From sunny San Diego we talk about the latest and greatest cannabis research and the questions that the Cannabis Data Science team can answer this year with rich, publicly available data that is sitting there like a pile of gold nuggets on a table, free for the taking! Video Code
Natural Language Processing to Extract Data from Human-Written Text Parsing natural language can be complex, but can yield valuable data. We use the SpaCy Python package to parse human-entered labels to unlock never-before-crunched data that we then readily analyze with the best-known statistical models. Video Code
Exploratory Data Analysis: Correlations, Deviations, and Regressions Lesson of the day: measurements vary and it is of utmost importance to explore our data to understand how it varies. We learn from the original statisticians how to describe, explore, and then analyze our rich, albeit messy data. Video Code
Study Habit Patterns of Successful Scientists This is a short, but sweet story about how we can learn and be inspired from history, however bizarre it may be. We explore one of first analyses of study habits of successful scientists. Video Code
Brand Analysis: Measuring Marketing Can we measure marketing performance for a cannabis brand? This week we estimate market share, penetration, customer value, and a myriad of other marketing metrics for the top cannabis-infused beverage brands in Washington State to prove that yes we can! Grab your favorite beverage and enjoy. Video Code
Game Theory to Model Entry and Exit in Cannabis Markets This week we dip our toes into game theory. We model cannabis production as a game and use it to predict actual entry and exit into cannabis markets in Washington State. It's all fun and games, until someone makes a profit! Video Code
Consumer Choice How will inflation affect the proportion of people who use cannabis and the quantity of cannabis consumed by people who consume cannabis? These are questions that the Cannabis Data Science group is uniquely poised to answer. Join us in modeling both participation and consumption to make unbiased, consistent predictions about cannabis consumption. Video Code
Data Curation: Helping Consumers Access Pesticide Data Because Washington State makes cannabis traceability data available to the public, data scientists can calculate statistics to help consumers. As a proof of concept for other states, such as Oregon, we begin to curate public Washington State pesticide data to make the data easily accessible to consumers. Video Code
Artificial Intelligence: Overcoming Asymmetric Information What is AI? On the holiest of cannabis days, the Cannabis Data Science team cooks up an artificial intelligence to be a curator and custodian of cannabis data to reduce asymmetric information and move one tiny step forwards to a better world for everyone with cannabis. Video Code
Cannabis Consumption: Estimating Consumer Demand Someone call a plumber! The data dam just burst! The Cannabis Data Science team is all hands on deck serving you up the holy grail of cannabis data: cannabis use rates. Get them while they're hot!!! This is the holy grail folks. Today we curate variables related to cannabis use in the USA by state from a 2019-2020 Census survey. Video Code
Fertilizers: Costs, Benefits, and Plant Hardiness What are Nitrogen (N), Phosphorus (P), and Potassium (K) and why should a cannabis cultivator care? This is the exact topic of the day. We follow the money and connect the dots from Saskatchewan to Humboldt County. Video Code
Fertilizer Prices and the US Hemp Canopy Every piece of the puzzle must be filled in, so we diligently collect all public fertilizer price and hemp yield, harvest, and acreage data under the sun. Please enjoy and explore the golden data laying before us. Video Code
Predicting Effects + Aromas Part 1: Preparing & Training Prediction Models Now's better than never, we release the SkunkFx! This is where things get interesting. The Cannabis Data Science Team makes it abundantly clear with the one-and-only cannabis effects prediction model that open-source statistics yields better prediction models than you can build with hundreds of millions of dollars of funding in legacy systems. Please enjoy and please put the statistics to good use. Video Code
Plant Patents: Classifying Cultivars with Terpene Lab Results There's been noise that lab results and strains don't mean anything. We push back hard as lab results are the key mechanism that top cultivators and lawyers are using to file plant patents. Don't let people sell you hype and miss out on a golden opportunity of a life time! Video Code
Predicting Effects + Aromas Part 2: Distinguishing Type and Strain Effects Walk, then run. We clearly outline our theory, the statistical models that we will use, and the intricacies of the data as we prepare to predict effects and aromas of cannabis strains given their lab results, distinguishing different type and strain effects that you may experience in various varieties. Video Code
Enter the Skunk: Using Statistics to Make Predictions Tell your developer(s) about our free effects and aromas API! If you're paying for cannabinoid and/or terpene tests, then you may as well have the effects and aromas of your products predicted ๐Ÿ”ฎ for free! Simply input cannabinoid and/or terpene data and you will receive a prediction of probable effects and aromas. The cherry on top is that you can report back the actual effects and aromas that characterize your product and the model becomes that much smarter! So, you can make your predictions better over time if you opt-in to providing feedback. Please explore at your pleasure and, hopefully, you are able to find many clever uses for the statistics. Bon appรฉtit! Video Code

๐Ÿš€ Getting Started

First things first, you can clone the repository:

git clone https://github.com/cannlytics/cannabis-data-science.git

The majority of examples are written in Python. If you install Anaconda, then you can create a virtual environment with all of the packages that you will need:

cd ./cannabis-data-science
conda create --name cds python=3.9
conda activate cds
pip install -r requirements.txt

You should now be off to the races and able to go through most notebooks, following any notebook-specific instructions to download supplementary datasets.

๐ŸŒŸ Contributing

Contributions are always welcome! Please submit issues, questions, bugs, fixes, improved-upon code, or anything at all that you want to be addressed. Anyone is welcome to contribute anything. You can refer to the Cannlytics contributing guide for more information about contributing to the Cannlytics ecosystem in general. One of the easiest ways that you can help the group is by giving the repository a โญ

๐Ÿ’ฌ Join the Cannabis Data Science Slack channel to keep the conversation going!

๐Ÿ’– Support

The Cannabis Data Science meetup group and the accompanying source code is made available with โค๏ธ and your good will. Please consider making a contribution to help us continue crafting useful code and wrangling new datasets for you. Thank you ๐Ÿ™

Provider Link
๐Ÿ‘ OpenCollective https://opencollective.com/cannlytics-company/donate
๐Ÿ’ธ PayPal Donation https://cannlytics.page.link/donate
๐Ÿ’ต Venmo Donation https://www.venmo.com/u/cannlytics
๐Ÿช™ Bitcoin donation address 34CoUcAFprRnLnDTHt6FKMjZyvKvQHb6c6
โšก Ethereum donation address cannlytics.eth

๐Ÿ›๏ธ License

Copyright (c) 2021-2022 Cannlytics

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Please cite the following if you use the code examples in your research:

@misc{cannlytics2023,
  title={Cannabis Data Science},
  author={Skeate, Keegan and O'Sullivan-Sutherland, Candace},
  journal={https://github.com/cannlytics/cannabis-data-science},
  year={2023}
}

About

๐Ÿš€ Cannabis Data Science repository powered by ๐Ÿ”ฅ Cannlytics. ๐Ÿง‘โ€๐Ÿš€ Meetup, code, and advance cannabis science ๐Ÿงช. Join the fun!

Topics

Resources

License

Stars

Watchers

Forks