Skip to content

ComplexCity/jsonexplore

Repository files navigation

##What these scripts do The purpose of the scripts in this repository is to explore an input JSON file to give insight into its content: its architecture, its fields and their different values.

The main script is jsonexplore.py. It explores the given JSON file and build a tree of Obj objects from it thanks to the ObjBuilder. Then it renders the tree of Obj using a printer. Tree basic printers are avalaible at this time:

  • The ObjTextPrinter will simply output a string describing the tree, printed on console
  • The ObjJsonPrinter will output a JSON dictionary, written in the output obj.json file
  • The output JSON file can be used as the input of the ObjHtmlPrinter to display a D3.js tree.

##The Obj class An Obj is created for each key of the given JSON dictionary.

###Properties

  • name: the name of the field or "." for root

  • path: the path of the field from root

  • level: the distance from root

  • type: the type of its value(s) (e.g. dict, list, unicode, int…)

  • values: a dictionary associating to each existing value the number of times this value is found (see also the methods).

  • nb_times_it_exists: if this field is part of an item in a list, it says how many items in the list have this field

  • nb_times_it_is_expected: if this field is part of an item in a list, it says how many items are in the list, that is to say how many times this field should appear

  • nb_items: in case of a list, it says how many items it contains (basically it is the length of the list). If this list exists as an item of a parent list, it says the max length of the list.

  • nb_items_min: in case of a list existing as an item of a parent list, it says the min length of the list

  • children: in case of a dictionary, it is the list of its keys each one reprensented by an Obj instance

###Methods

  • is_optional(): if the field is part of an item in a list, it says if this field is sometimes missing in the other items of the list, in other words if nb_times_it exists is different from nb_times_it_is_expected. If it is optional, it returns True. Otherwise, it return False.

  • (string) get_values_summary(): depending on the values property, it returns a string or a list offering a more comprehensible version of the values dictionary :

    • "'value'" if there is only one value, used only once

    • "Always 'value'" if there is only one value, used more than once

    • "Always empty" if the value is always an empty string

    • "All different values" if each item has a different value

    • "Almost all different values (each value appear from x to y times)" if every value is used less than 5 times

    • a list where each value is reprensented by a dictionary with 2 keys:

      {
      	'value': the_value,
      	'count': the_number_of_times_this_value_is_used
      }
      

    This method is used by the ObjJsonPrinter, and consequently by the ObjHtmlPrinter to display the values.

  • get_sample_value(): it returns the first value that is not an empty string or None if not found.

##The ObjTextPrinter Example of output:

|- .: a Dict composed of:
	|- statuses: a List composed of 10 items like:
		|- [x]: a Dict composed of:
			|- attitudes_count: int - values:
				'0' [7]
				'1' [3]
			|- bmiddle_pic (Optional: only 6 value(s) over the 10 items): unicode - values:
				'http://ww3.sinaimg.cn/bmiddle/4bed07b4jw1ef4pyzb9kzj20uh15ok33.jpg' [1]
				'http://ww4.sinaimg.cn/bmiddle/664b3fe9jw1ef4px8ca0uj20bv0ft40f.jpg' [1]
				'http://ww4.sinaimg.cn/bmiddle/9f767fc7jw1ef4prguk4tj20hs0npdjc.jpg' [1]
				'http://ww3.sinaimg.cn/bmiddle/4d3ffe9ejw1ef4q0uogdtj20xc18gguu.jpg' [1]
				'http://ww1.sinaimg.cn/bmiddle/df92068ejw1ef4pqnefs2j20xc18gjyx.jpg' [1]
				'http://ww4.sinaimg.cn/bmiddle/8bf1e5b4jw1ef4pxgzavsj20f00qo756.jpg' [1]
			|- comments_count: int - values:
				'0' [9]
				'1' [1]
			|- distance: int - values:
				'400' [1]
				'1700' [1]
				'1800' [3]
				'1900' [1]
				'2000' [2]
				'1400' [1]
				'1500' [1]
			|- favorited: bool (always: 'False')
			|- in_reply_to_user_id: unicode (always: '')
			|- mid: unicode - values:
				'3696007882016559' [1]
				'3696009920372953' [1]
				'3696010247357290' [1]
				'3696007861039896' [1]
				'3696010012795278' [1]
				'3696009530197458' [1]
				'3696008955203962' [1]
				'3696009429891580' [1]
				'3696008082926243' [1]
				'3696010490692288' [1]
			|- reposts_count: int (always: '0')
			|- truncated: bool (always: 'False')
			|- user: a Dict composed of:
				|- allow_all_comment: bool - values:
					'False' [3]
					'True' [7]
				|- location: unicode - values:
					'上海 普陀区' [5]
					'湖北 襄阳' [1]
					'其他' [1]
					'上海 黄浦区' [1]
					'上海 闸北区' [1]
	|- total_number: int (value: '129751')

##The ObjHtmlPrinter You can see a working example here.

This visualisation is based on D3js.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages