img

My project for the AlgoExpert software engineering competition winter 2020

Overview

The inspiration for this project is simple, cut down image(and eventually video) storage space. If I could build such an algorithm, companies would be able to drastically cut down their storage space. This is done by changing the image into a text file. Since text files are easier to compress and transmit, a business would simply store the text on a server. On the front end, this text is then converted to an image. This will cut down the storage space required for any business. By saving the text of a grayscaled image (instead of fully colored one), a company (like AlgoExpert) will be able to really cut down storage space. The project is able to create the essence of the image but isn't able to capture the the subtelty of the relative shading between pixels. However, it is able to create a reasonable image, especially given the constraints

All in all, this project was really hard to do. Figuring out what I wanted took a lot of time. I spent almost all my free time(when I had the ability) working on the project and trying to take different approaches to solve this problem. I've rewritten almost the whole project from scratch multiple times to try multiple apporaches. However, I'm glad I did it. This project has taught me a lot, and I plan on seeing it through to completion. I'm glad that I could build a working proof of concept that Youtube Vid: https://www.youtube.com/watch?v=6MP4v6LYO5g

Who Am I?

Here's my information: https://people.rit.edu/dl1683/#projects Looking for summer co-ops. Pls hire me.

What this project does

This project converts a picture into text . While converting an image into text, it changes the image into grayscale. Grayscaled texts require lesser space and easier to compress than colored ones since the grayscaling algorithm changes [r,g,b]-->[val] where 0<=r,g,b<=255 and 0<=val<=1. Doing so changes a 3-dimensional space (r,g,b) into a single dimensional space. Therein lies the problem; there's infinite (literally) solns to any given configuration (in this case infinite ways to get any grayscaled value from [r,g,b]). To reconvert a grayscale image into color (black and white to color) is impossible (from a mathematical standpoint). By using machine learning of some sort however, one can hope to create reasonable approximations of an image.

Theoretical Challenges

To understand the chanllenges of this problem one needs to understand basic linear algebra. As is often taught, to solve for n variables, one needs n linearly independent and consistent equations. By grayscaling, we remove the dimensions. By constraining values we can remove the element of infinite solns, but we still have to make assumptions on 2 of the 3 vars (r,g,b) to get the 3rd.
While thinking of ways to work around this problem, I came up with visualizing the r,g,b as a 3-D plane. By each value by 255, we get the vector[1,1,1] for white and [0,0,0] for black. Everyother color would have some other vector: v= [x,y,z], [x,y,z]->[0,1]. Every grayscaled value would be the distance (length) of the vector projection of v onto the vector [1,1,1]. The treating of distance as a distance opened up some interesting possibilities such as pathfinding (probably A*), random forest (maybe Markov Chains) and nearest neighbours regression. These solutions unfortunately could not be implemented for practical reasons discussed below.

Practical challenges

There were some practical challenges to overcome. The biggest were:

College and work

Co-ops for the summer

Training models and lack of resources

Central Limit Threorem

Important Code Bits

Following are the important (and not obvious parts of my code). They are really important to my code.

: Imgs.py (117-120). This piece of code is responsible for most of the output in terms of color. The randomized Gamma param is used to ungray the grayed r,g,b vals. Unfortunalately, randomness also adds a lot of noise. By using constrained optimization we can minimize this problem(done implicitly to calculate the randomness range, alongside trial and error). However, to get rid the noise, I would have to implement Gaussian Denoising and edge detection. To do so, requires computing, which I hope to gain by presenting this soln. as a proof of concept to our great and groundbreaking faculty.

Handling a 2D array and 1 D array in 1 loop: Imgs.py (190 and 203-205). I had to read a 1D array (list of all grayscale vals saved in the most efficient text form possible) and also a 2D array (the image itself). Furthermore, I needed to calculate the corresponding 1d locations of the 2D image neighbors. All of this required some mental and algebriac gymnastics but I figured it out in the end. To not have to loop over the text (list) and then [i,j] 2-D Arr, I went back to counting and treated the len(l) as a number of base (width of picture). Then I could treat i,j accordingly and I needed only 1 for loop for the entire process (complexity O(n^2) instead of O(2n^2) with a lot more memory used or O(n^4) with the same amount of memory as my current soln.) This part was really easy to implement when I figured out what to do and the formulae.

All of Train.py: Nothing else would be possible without the ability to serialize the models. It let me train over the server and save my laptop.

File overview

I have a lot of files. This is what they do:
ImgEnhance.py: This is the main file to run. Run this to see the solution in action. (Takes a while)

Train.py: Training file. This is what I've used to train on my uni servers. If I can negociate for more space and resources I will be adding and removing from this. Also serializes my models to reduce my computation. The training works by using a list of grayscales of the neighbors as the input to produce the output.

Img.py: This is the file that actually deals with everything to do with images. It converts to text and vice-versa.

test.py: My testing file. This was built so that I could test specific ideas and modules without running the whole thing. Really saved me a lot of time and computation.

txt.py: This file the mapping, encoding, and decoding of the various files. The grayscaling etc. is handled by this file.

new_net.py: My variation on a neural net. I was going to write one to optimize my training for my specific problem. Unfortunately, because of the aforementioned space issues, I had to rely on sklearn. I will continue to build this on my own.

Progress and next steps

I've gotten to a pretty good place in terms of the solutions. The generated image retains all the information and distinguishable edges and the form of the original image. It also maintains the relative shading of the image. However, it can't fully work out the recoloring. I will continue to test different things to get that.
Below are some of the next steps:
Training the neighbors with r,g,b seperately: The easiest possible soln. I came up with. However, I haven't implemented this because prediction one pixel with one model is very intensively. Using 3 per pixel will be too much (both training and testing). I will see if I can get the resources to get this working.

GANS: I'm expecting GANs to be really good for something like this. Training them will be a handful, but they should be exciting. The problem will be making them into a practical soln.

CNNS: Another fun idea. Also very intensive. Don't think they will be as good GANS.

Random Forests: Something I will try out first. The nature of this problem seems like it will lend itself to random forest.

Gray->color fitting: Something I'm thinking of trying as a way to weigh the solutions. By fitting the grayscale to color, we will run the risk of overfitting. However, it has potential to work wonders to generate a new image

Images comparison
I have 3 different images that I tested with. Each has distinct features to test the ability of the recreation function. Smol 3 in particular is a very complex image with changes in shading, and a hard shapes. The image is recreated pretty faithfully and we can conclude that by applying some generative functions and gamma corrections, we will get a faithful approximation of the original image. . The colored dots in the recreated images are places where the shading of the image changes by a certain amount. I will work on minimizing error there.
Originals
Image 1:
Image 2:

Image 3:

GrayScaled

Image 1

Image 2

Image 3

Recolored (from black and white)
Image 1

Image 2

Image 3

Recommended Things/Bibliograhy

Following were some of the areas I looked for inspiration. I looked a lot to hilbert curves and various mathematical foundation for mapping and creation.
https://ieeexplore.ieee.org/document/6460162

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.323.8826&rep=rep1&type=pdf

https://people.csail.mit.edu/jaffer/Geometry/HSFC

https://math.stackexchange.com/questions/921985/what-is-the-hilbert-curves-equation

http://www.cs.cmu.edu/~christos/PUBLICATIONS/ieee-tkde-hilbert.pdf

https://arxiv.org/pdf/1710.06384.pdf

https://bioconductor.org/packages/devel/bioc/vignettes/HilbertCurve/inst/doc/HilbertCurve.html

https://www.scs.stanford.edu/11au-cs240h/projects/sinis.pdf

https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/image-optimization

https://scikit-learn.org/stable/modules/svm.html#mathematical-formulation

https://towardsdatascience.com/demystifying-deep-image-prior-7076e777e5ba

https://ieeexplore.ieee.org/document/4408909

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
TrainedData		TrainedData
__pycache__		__pycache__
pics		pics
BlackAndWhiteaditi.png		BlackAndWhiteaditi.png
BlackAndWhitesmol(2).png		BlackAndWhitesmol(2).png
BlackAndWhitesmol(3).jpg		BlackAndWhitesmol(3).jpg
BlackAndWhitesmol.png		BlackAndWhitesmol.png
ImgEnhance.py		ImgEnhance.py
Imgs.py		Imgs.py
README.md		README.md
Read.tex		Read.tex
Vids.py		Vids.py
__init__.py		__init__.py
aditi.png		aditi.png
look-im-batman-lol-19465195.png		look-im-batman-lol-19465195.png
new_net.py		new_net.py
project.avi		project.avi
smol(2).png		smol(2).png
smol(3).jpg		smol(3).jpg
smol.png		smol.png
test.py		test.py
test.txt		test.txt
test2.txt		test2.txt
testedaditi.png		testedaditi.png
testedsmol(2).png		testedsmol(2).png
testedsmol(3).jpg		testedsmol(3).jpg
testedsmol.png		testedsmol.png
train.py		train.py
train2.py		train2.py
txt.py		txt.py

dl1683/img

Folders and files

Latest commit

History

Repository files navigation

img

Overview

Who Am I?

What this project does

Theoretical Challenges

Practical challenges

Important Code Bits

File overview

Progress and next steps

Images comparison

Recommended Things/Bibliograhy

About

Resources

Stars

Watchers

Forks

Languages