def main(): parser = ArgumentParser(description="{description}". \ format(description = __description__), epilog="{copyright}; ". \ format(copyright = __copyright__) + \ "written by {author} <{email}>.". \ format(author = __author__, email = __email__)) parser.add_argument("-v", help="See the version of this program", action="version", version=__version__) parser.add_argument( \ '-b','-verbose',help="set verbose logging", action='store_const',dest='log_level', const=INFO \ ) parser.add_argument( \ '-d','--debugging',help="set debugging logging", action='store_const',dest='log_level', const=DEBUG \ ) parser.add_argument( \ '-l','--log_loc',help="save logging to a file", action="store_const",dest="log_loc", const='./{progname}.log'. \ format(progname=__file__) \ ) parser.add_argument( \ '--db_url',help="Enter a db url",action='store' \ ) parser.add_argument( \ '--root',help="Enter the root of the repository", action='store') parser.add_argument( \ '--object_pattern',help="Enter the regex pattern " + \ "to match an object", action='store') parser.add_argument( \ '--page_pattern',help="Enter the regex pattern " + \ "to match a page", action='store') parser.add_argument( \ 'accessions',nargs="*",action='store', help="Enter 1 or more accession " + \ "identifiers to process" \ ) args = parser.parse_args() log_format = Formatter( \ "[%(levelname)s] %(asctime)s " + \ "= %(message)s", datefmt="%Y-%m-%dT%H:%M:%S" \ ) global logger logger = getLogger( \ "lib.uchicago.repository.logger" \ ) ch = StreamHandler() ch.setFormatter(log_format) try: logger.setLevel(args.log_level) except TypeError: logger.setLevel(INFO) if args.log_loc: fh = FileHandler(args.log_loc) fh.setFormatter(log_format) logger.addHandler(fh) logger.addHandler(ch) db = Database(args.db_url, ['record','file']) class Record(db.base): __table__ = Table('record', db.metadata, autoload=True) class File(db.base): __table__ = Table('file', db.metadata, autoload=True) query = db.session.query(File).filter(File.accession.in_(args.accessions)) if args.root: batch = Batch(args.root, query = query) items = batch.find_items(from_db = True) batch.set_items(items) else: raise ValueError("need to include a root") try: all_objects = [] for item in batch.get_items(): accession = item.find_file_accession() item.set_accession(accession) canon = item.find_canonical_filepath() item.set_canonical_filepath(canon) search_pattern = item.find_matching_object_pattern( \ re_compile("(mvol)/(\w{4})/(\w{4})/(\w{4})/" + "(mvol)-(\w{4})-(\w{4})-(\w{4})")) if search_pattern.status == True: potential_identifier = '-'.join(search_pattern.data.groups()) is_an_object_already_present = [x for x in all_objects \ if x.identifier == \ potential_identifier] if is_an_object_already_present: logger.debug("found this id already") else: logger.debug("this id is new!") new_object = DigitalObject(potential_identifier) all_objects.append(new_object) logger.debug(potential_identifier) return 0 except KeyboardInterrupt: logger.error("Program aborted manually") return 131
def main(): parser = ArgumentParser(description="{description}". \ format(description = __description__), epilog="{copyright}; ". \ format(copyright=__copyright__) + \ "written by {name} ".format(name=__author__) + \ " <{email}> ".format(email=__email__) + \ "University of Chicago") parser.add_argument("-v", help="See the version of this program", action="version", version=__version__) parser.add_argument( \ '-b','-verbose',help="set verbose logging", action='store_const',dest='log_level', const=INFO \ ) parser.add_argument( \ '-d','--debugging',help="set debugging logging", action='store_const',dest='log_level', const=DEBUG \ ) parser.add_argument( \ '-l','--log_loc',help="save logging to a file", action="store_const",dest="log_loc", const='./{progname}.log'. \ format(progname=argv[0]) \ ) parser.add_argument("location_root",help="Enter the root " + \ "of the directory path", action="store") parser.add_argument("directory_path", help="Enter a directory that you need to work on ", action='store') args = parser.parse_args() log_format = Formatter( \ "[%(levelname)s] %(asctime)s " + \ "= %(message)s", datefmt="%Y-%m-%dT%H:%M:%S" \ ) global logger logger = getLogger( \ "lib.uchicago.repository.logger" \ ) ch = StreamHandler() ch.setFormatter(log_format) try: logger.setLevel(args.log_level) except TypeError: logger.setLevel(INFO) if args.log_loc: fh = FileHandler(args.log_loc) fh.setFormatter(log_format) logger.addHandler(fh) logger.addHandler(ch) try: b = Batch(args.location_root, directory = args.directory_path) generator_object = b.find_items(from_directory=True) logger.debug(generator_object) b.set_items(generator_object) stdout.write("begin transaction;\n") for a_file in b.get_items(): if a_file.test_readability(): file_hash = a_file.find_hash_of_file(sha256) mime = a_file.find_file_mime_type() size = a_file.find_file_size() accession = a_file.find_file_accession() a_file.set_file_mime_type(mime) a_file.set_file_size(size) a_file.set_hash(file_hash) a_file.set_accession(accession) out_string = "insert into file (filepath,accession," + \ "mimetype,size,checksum) values (" + \ "\"{path}\",\"{accession}\",\"{mimetype}\"". \ format(path = a_file.filepath, accession = a_file.get_accession(), mimetype = a_file.get_file_mime_type()) + \ ",{filesize},\"{filehash}\");\n". \ format(filesize = a_file.get_file_size(), filehash = a_file.get_hash()) stdout.write(out_string) else: logger.error("{path} could not be read". \ format(path=a_file.filepath)) stdout.write("commit;\n") return 0 except KeyboardInterrupt: logger.warn("Program aborted manually") return 131