Skip to content

yangsheng327/Stock-return-classification

Repository files navigation

Stock-return-classification

Data: the attached spreadsheet has three tabs, “US”, “JP” and “KR”. These tabs contain timeseries corresponding to stock returns from the US, Japan, and Korea, respectively. These returns series are actually residuals from a factor model (so the part of the return that is explained by common risk factors has been taken out in each case). In addition they are not aligned to each other (the start date for each has been chosen randomly). And all are normalized to have mean 0 and variance 1. The length is 252 days in each case (the number of trading days in a typical US calendar year). The returns are consecutive in terms of dates.

Here are some problems we would like you to look at, using your favorite machine learning method(s):

  1.   Can you learn to classify US stocks vs Japanese stocks vs Korean stocks from the time series (you would have to leave some holdouts for a test set, obviously). You’ll have to decide what features to look at, how to learn the classification, etc. Also consider what additional information you might be able to use to train your classifier.
    
  2.    The next question is this: can you classify the stock time series (for a given country, or all 3 countries put together) vs. *random, synthetic time series that you yourself will generate*. Note that it’s up to you to generate the random time series in the control set, as many as you like. But we would like you to make the problem progressively harder for yourself:
    

a. First, let your random timeseries be iid normal with mean 0 and variance 1 (normalize to variance of exactly 1 to match the data). This should be a very easy classification task – there are obvious properties of the stock time series that should allow you to distinguish them from iid normals with very high accuracy (what are they)? b. Now, try to generate random time series that match these obvious properties in distribution and come up with classifiers that are still able to distinguish between the stocks and the synthetic data. You can go through several rounds of this – or do it in a more systematic way. Again you can use your favorite machine learning techniques.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages