Elasticsearch_LSC_SettingUp

This is for setting up ES. Please install Elasticsearch first. Then run the ES, it will create a localhost:9200. Finally run this script to create analysers and tokeniser for fields in the ES. Note: Please copy the [all_synonym.txt] file to [elastic_folder/config/analysis/] folder before running

Change Log

09/03/2020

Merge all retrieving function to MyLibrary_v2.py
Grouping now is performed at indexing procedure (No need to run grouping after searching) --> Update ES_text_search.py
Add searching through REST API to make it faster (reduce searching time from 1.5s to below 0.4s)
Try Test_sequence_query.py to measure the time for sequential event retrieving

25/02/2020

How to search Sequence Action

Need to re-index elastic database (add some time fields)
Download the SIFT feature pickle (below) and put in data folder
Try with Test_sequence_query.py
The search flow as depicted in below figures.

Updated Note

Add NLP Processing in Tag_Event.py Library.

Usage: Run the function extract_info_from_sentence_full_tag(sentence) to extract the dictionary of past, present, and future action. If there is no information extracted, it will be an empty dict for each action.

Add ProcImgLib.py Library: grouping list of images based on its SIFT, description, and time into distinct clusters.

Usage: Run the function grouping_image_with_sift_dict(list_images, description_dict, sift_dict) to group images. The result will be the list of images cluster.
Description_dict can be found in folder data and Sift_dictDownload.

Update MyLibrary_v2.py: Mostly update about generate query in search text case

Mechanism will be generate list of dismax and filter list, then generate json format query to pass to ES
generate_list_dismax_part_and_filter_time_from_info(info) take the input is the past/present/future action created in section 1, and create list dismax, filter.
Pass the list dismax and filter to generate_query_text function to create json fornat to be used in ES

Test_sequence_query.py to test to search sequence (or single) action in ES. Flow will be generate dictionary of past/present/future dict, then seach each action and update the following. Do it TWICE!

Single action: search text as normal
Double action: search the previous action --> group result images --> generate times --> Add times as condition to the following action --> Do it again 1 more time!
- The Result will be the ranked list of [[group action 1], [group action 2], average score]
Triple action: Search action 1, action 3 --> group 1, group 3 --> times 1, times 3 --> add to query 2 --> search action 2

ES_text_search.py update how to index data: add time, day, month, hour, minute field to the data.

05/12/2019

Fix bug in text search (in synonym dictionary)
Update all_synonym.txt (add it to config folder in Elasticsearch)
Update List_synonym_glove_all.pickle
Revise setting in ES_text_search.py and ES_combined_search.py (add expand=false setting in synonym filter)

28/11/2019

Now add 3 ES setting up files: text, bow, combined. Run the ES_[method]_search.py to begin.
Change to MyLibrary_v2 (change name of some function as well)
Run the Process_BOW to create all necessary files (bow feature, bow dictionary, ...) After running the code, please wait for processing and indexing data to finish. Then you can play with the Elasticsearch

Run with Django

If you want to run it with Django, in the views.py, use this code:

import MyLibrary_v2 as mylib_v2
# query is the string query
# List_synonym is the synonym file provided in the repo
# output will be result including 2 information for each image (image_path and name of that image)
data  = mylib_v2.generate_es_text(q=query)
headers = {"Content-Type": "application/json"}
response = requests.post("http://localhost:9200/lsc2019/_search", headers=headers, data=data)

if response.status_code == 200:
	#stt = "Success"
	response_json = response.json()  # Convert to json as dict formatted
	id_images = [d["_source"]["id"] for d in response_json["hits"]["hits"]]
	id_images_full_path = mylib_v2.add_folder_to_id_images('./LSC_Data/', id_images)
	result = [{'img': id_images_full_path[x], 'title': id_images[x]} for x in range(len(id_images))]
	#print("Searching time: " + str(response_json["took"]) + "ms")
else:
	#stt = "Fail"
	result = []

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.idea		.idea
data		data
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

data

data

src

src

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Elasticsearch_LSC_SettingUp

Change Log

09/03/2020

25/02/2020

How to search Sequence Action

Updated Note

05/12/2019

28/11/2019

Run with Django

About

Releases

Packages

Languages

m2man/Elasticsearch_LSC_SettingUp

Folders and files

Latest commit

History

Repository files navigation

Elasticsearch_LSC_SettingUp

Change Log

09/03/2020

25/02/2020

How to search Sequence Action

Updated Note

05/12/2019

28/11/2019

Run with Django

About

Resources

Stars

Watchers

Forks

Languages