示例#1
0
文件: logs.py 项目: ese/paasta
def tail_paasta_logs(service, levels, components, clusters, raw_mode=False):
    """Sergeant function for spawning off all the right log tailing functions.

    NOTE: This function spawns concurrent processes and doesn't necessarily
    worry about cleaning them up! That's because we expect to just exit the
    main process when this function returns (as main() does). Someone calling
    this function directly with something like "while True: tail_paasta_logs()"
    may be very sad.

    NOTE: We try pretty hard to supress KeyboardInterrupts to prevent big
    useless stack traces, but it turns out to be non-trivial and we fail ~10%
    of the time. We decided we could live with it and we're shipping this to
    see how it fares in real world testing.

    Here are some things we read about this problem:
    * http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool
    * http://jtushman.github.io/blog/2014/01/14/python-%7C-multiprocessing-and-interrupts/
    * http://bryceboe.com/2010/08/26/python-multiprocessing-and-keyboardinterrupt/

    We could also try harder to terminate processes from more places. We could
    use process.join() to ensure things have a chance to die. We punted these
    things.

    It's possible this whole multiprocessing strategy is wrong-headed. If you
    are reading this code to curse whoever wrote it, see discussion in
    PAASTA-214 and https://reviewboard.yelpcorp.com/r/87320/ and feel free to
    implement one of the other options.
    """
    scribe_envs = set([])
    for cluster in clusters:
        scribe_envs.update(determine_scribereader_envs(components, cluster))
    log.info("Would connect to these envs to tail scribe logs: %s" %
             scribe_envs)
    queue = Queue()
    spawned_processes = []
    for scribe_env in scribe_envs:
        # Tail stream_paasta_<service> for build or deploy components
        if any([component in components for component in DEFAULT_COMPONENTS]):
            # Start a thread that tails scribe in this env
            kw = {
                'scribe_env': scribe_env,
                'stream_name': get_log_name_for_service(service),
                'service': service,
                'levels': levels,
                'components': components,
                'clusters': clusters,
                'queue': queue,
                'filter_fn': paasta_log_line_passes_filter,
            }
            process = Process(target=scribe_tail, kwargs=kw)
            spawned_processes.append(process)
            process.start()

        # Tail Marathon logs for the relevant clusters for this service
        if 'marathon' in components:
            for cluster in clusters:
                kw = {
                    'scribe_env': scribe_env,
                    'stream_name': 'stream_marathon_%s' % cluster,
                    'service': service,
                    'levels': levels,
                    'components': components,
                    'clusters': [cluster],
                    'queue': queue,
                    'parse_fn': parse_marathon_log_line,
                    'filter_fn': marathon_log_line_passes_filter,
                }
                process = Process(target=scribe_tail, kwargs=kw)
                spawned_processes.append(process)
                process.start()

        # Tail Chronos logs for the relevant clusters for this service
        if 'chronos' in components:
            for cluster in clusters:
                kw = {
                    'scribe_env': scribe_env,
                    'stream_name': 'stream_chronos_%s' % cluster,
                    'service': service,
                    'levels': levels,
                    'components': components,
                    'clusters': [cluster],
                    'queue': queue,
                    'parse_fn': parse_chronos_log_line,
                    'filter_fn': chronos_log_line_passes_filter,
                }
                process = Process(target=scribe_tail, kwargs=kw)
                spawned_processes.append(process)
                process.start()
    # Pull things off the queue and output them. If any thread dies we are no
    # longer presenting the user with the full picture so we quit.
    #
    # This is convenient for testing, where a fake scribe_tail() can emit a
    # fake log and exit. Without the thread aliveness check, we would just sit
    # here forever even though the threads doing the tailing are all gone.
    #
    # NOTE: A noisy tailer in one scribe_env (such that the queue never gets
    # empty) will prevent us from ever noticing that another tailer has died.
    while True:
        try:
            # This is a blocking call with a timeout for a couple reasons:
            #
            # * If the queue is empty and we get_nowait(), we loop very tightly
            # and accomplish nothing.
            #
            # * Testing revealed a race condition where print_log() is called
            # and even prints its message, but this action isn't recorded on
            # the patched-in print_log(). This resulted in test flakes. A short
            # timeout seems to soothe this behavior: running this test 10 times
            # with a timeout of 0.0 resulted in 2 failures; running it with a
            # timeout of 0.1 resulted in 0 failures.
            #
            # * There's a race where thread1 emits its log line and exits
            # before thread2 has a chance to do anything, causing us to bail
            # out via the Queue Empty and thread aliveness check.
            #
            # We've decided to live with this for now and see if it's really a
            # problem. The threads in test code exit pretty much immediately
            # and a short timeout has been enough to ensure correct behavior
            # there, so IRL with longer start-up times for each thread this
            # will surely be fine.
            #
            # UPDATE: Actually this is leading to a test failure rate of about
            # 1/10 even with timeout of 1s. I'm adding a sleep to the threads
            # in test code to smooth this out, then pulling the trigger on
            # moving that test to integration land where it belongs.
            line = queue.get(True, 0.1)
            print_log(line, levels, raw_mode)
        except Empty:
            try:
                # If there's nothing in the queue, take this opportunity to make
                # sure all the tailers are still running.
                running_processes = [tt.is_alive() for tt in spawned_processes]
                if not running_processes or not all(running_processes):
                    log.warn(
                        'Quitting because I expected %d log tailers to be alive but only %d are alive.'
                        % (
                            len(spawned_processes),
                            running_processes.count(True),
                        ))
                    for process in spawned_processes:
                        if process.is_alive():
                            process.terminate()
                    break
            except KeyboardInterrupt:
                # Die peacefully rather than printing N threads worth of stack
                # traces.
                #
                # This extra nested catch is because it's pretty easy to be in
                # the above try block when the user hits Ctrl-C which otherwise
                # dumps a stack trace.
                log.warn('Terminating.')
                break
        except KeyboardInterrupt:
            # Die peacefully rather than printing N threads worth of stack
            # traces.
            log.warn('Terminating.')
            break
示例#2
0
def tail_paasta_logs(service, levels, components, clusters, raw_mode=False):
    """Sergeant function for spawning off all the right log tailing functions.

    NOTE: This function spawns concurrent processes and doesn't necessarily
    worry about cleaning them up! That's because we expect to just exit the
    main process when this function returns (as main() does). Someone calling
    this function directly with something like "while True: tail_paasta_logs()"
    may be very sad.

    NOTE: We try pretty hard to supress KeyboardInterrupts to prevent big
    useless stack traces, but it turns out to be non-trivial and we fail ~10%
    of the time. We decided we could live with it and we're shipping this to
    see how it fares in real world testing.

    Here are some things we read about this problem:
    * http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool
    * http://jtushman.github.io/blog/2014/01/14/python-%7C-multiprocessing-and-interrupts/
    * http://bryceboe.com/2010/08/26/python-multiprocessing-and-keyboardinterrupt/

    We could also try harder to terminate processes from more places. We could
    use process.join() to ensure things have a chance to die. We punted these
    things.

    It's possible this whole multiprocessing strategy is wrong-headed. If you
    are reading this code to curse whoever wrote it, see discussion in
    PAASTA-214 and https://reviewboard.yelpcorp.com/r/87320/ and feel free to
    implement one of the other options.
    """
    scribe_envs = set([])
    for cluster in clusters:
        scribe_envs.update(determine_scribereader_envs(components, cluster))
    log.info("Would connect to these envs to tail scribe logs: %s" % scribe_envs)
    queue = Queue()
    spawned_processes = []
    for scribe_env in scribe_envs:
        # Tail stream_paasta_<service> for build or deploy components
        if any([component in components for component in DEFAULT_COMPONENTS]):
            # Start a thread that tails scribe in this env
            kw = {
                'scribe_env': scribe_env,
                'stream_name': get_log_name_for_service(service),
                'service': service,
                'levels': levels,
                'components': components,
                'clusters': clusters,
                'queue': queue,
                'filter_fn': paasta_log_line_passes_filter,
            }
            process = Process(target=scribe_tail, kwargs=kw)
            spawned_processes.append(process)
            process.start()

        # Tail Marathon logs for the relevant clusters for this service
        if 'marathon' in components:
            for cluster in clusters:
                kw = {
                    'scribe_env': scribe_env,
                    'stream_name': 'stream_marathon_%s' % cluster,
                    'service': service,
                    'levels': levels,
                    'components': components,
                    'clusters': [cluster],
                    'queue': queue,
                    'parse_fn': parse_marathon_log_line,
                    'filter_fn': marathon_log_line_passes_filter,
                }
                process = Process(target=scribe_tail, kwargs=kw)
                spawned_processes.append(process)
                process.start()

        # Tail Chronos logs for the relevant clusters for this service
        if 'chronos' in components:
            for cluster in clusters:
                kw = {
                    'scribe_env': scribe_env,
                    'stream_name': 'stream_chronos_%s' % cluster,
                    'service': service,
                    'levels': levels,
                    'components': components,
                    'clusters': [cluster],
                    'queue': queue,
                    'parse_fn': parse_chronos_log_line,
                    'filter_fn': chronos_log_line_passes_filter,
                }
                process = Process(target=scribe_tail, kwargs=kw)
                spawned_processes.append(process)
                process.start()
    # Pull things off the queue and output them. If any thread dies we are no
    # longer presenting the user with the full picture so we quit.
    #
    # This is convenient for testing, where a fake scribe_tail() can emit a
    # fake log and exit. Without the thread aliveness check, we would just sit
    # here forever even though the threads doing the tailing are all gone.
    #
    # NOTE: A noisy tailer in one scribe_env (such that the queue never gets
    # empty) will prevent us from ever noticing that another tailer has died.
    while True:
        try:
            # This is a blocking call with a timeout for a couple reasons:
            #
            # * If the queue is empty and we get_nowait(), we loop very tightly
            # and accomplish nothing.
            #
            # * Testing revealed a race condition where print_log() is called
            # and even prints its message, but this action isn't recorded on
            # the patched-in print_log(). This resulted in test flakes. A short
            # timeout seems to soothe this behavior: running this test 10 times
            # with a timeout of 0.0 resulted in 2 failures; running it with a
            # timeout of 0.1 resulted in 0 failures.
            #
            # * There's a race where thread1 emits its log line and exits
            # before thread2 has a chance to do anything, causing us to bail
            # out via the Queue Empty and thread aliveness check.
            #
            # We've decided to live with this for now and see if it's really a
            # problem. The threads in test code exit pretty much immediately
            # and a short timeout has been enough to ensure correct behavior
            # there, so IRL with longer start-up times for each thread this
            # will surely be fine.
            #
            # UPDATE: Actually this is leading to a test failure rate of about
            # 1/10 even with timeout of 1s. I'm adding a sleep to the threads
            # in test code to smooth this out, then pulling the trigger on
            # moving that test to integration land where it belongs.
            line = queue.get(True, 0.1)
            print_log(line, levels, raw_mode)
        except Empty:
            try:
                # If there's nothing in the queue, take this opportunity to make
                # sure all the tailers are still running.
                running_processes = [tt.is_alive() for tt in spawned_processes]
                if not running_processes or not all(running_processes):
                    log.warn('Quitting because I expected %d log tailers to be alive but only %d are alive.' % (
                        len(spawned_processes),
                        running_processes.count(True),
                    ))
                    for process in spawned_processes:
                        if process.is_alive():
                            process.terminate()
                    break
            except KeyboardInterrupt:
                # Die peacefully rather than printing N threads worth of stack
                # traces.
                #
                # This extra nested catch is because it's pretty easy to be in
                # the above try block when the user hits Ctrl-C which otherwise
                # dumps a stack trace.
                log.warn('Terminating.')
                break
        except KeyboardInterrupt:
            # Die peacefully rather than printing N threads worth of stack
            # traces.
            log.warn('Terminating.')
            break
示例#3
0
def test_get_log_name_for_service():
    service = 'foo'
    expected = 'stream_paasta_%s' % service
    assert utils.get_log_name_for_service(service) == expected
示例#4
0
文件: logs.py 项目: white105/paasta
class ScribeLogReader(LogReader):
    SUPPORTS_TAILING = True
    SUPPORTS_LINE_COUNT = True
    SUPPORTS_TIME = True

    COMPONENT_STREAM_INFO = {
        'default':
        ScribeComponentStreamInfo(
            per_cluster=False,
            stream_name_fn=get_log_name_for_service,
            filter_fn=paasta_log_line_passes_filter,
            parse_fn=None,
        ),
        'stdout':
        ScribeComponentStreamInfo(
            per_cluster=False,
            stream_name_fn=lambda service: get_log_name_for_service(
                service, prefix='app_output'),
            filter_fn=paasta_app_output_passes_filter,
            parse_fn=None,
        ),
        'stderr':
        ScribeComponentStreamInfo(
            per_cluster=False,
            stream_name_fn=lambda service: get_log_name_for_service(
                service, prefix='app_output'),
            filter_fn=paasta_app_output_passes_filter,
            parse_fn=None,
        ),
        'marathon':
        ScribeComponentStreamInfo(
            per_cluster=True,
            stream_name_fn=lambda service, cluster: 'stream_marathon_%s' %
            cluster,
            filter_fn=marathon_log_line_passes_filter,
            parse_fn=parse_marathon_log_line,
        ),
        'chronos':
        ScribeComponentStreamInfo(
            per_cluster=True,
            stream_name_fn=lambda service, cluster: 'stream_chronos_%s' %
            cluster,
            filter_fn=chronos_log_line_passes_filter,
            parse_fn=parse_chronos_log_line,
        ),
    }

    def __init__(self, cluster_map):
        super(ScribeLogReader, self).__init__()

        if scribereader is None:
            raise Exception(
                "scribereader package must be available to use scribereader log reading backend"
            )
        self.cluster_map = cluster_map

    def run_code_over_scribe_envs(self, clusters, components, callback):
        """Iterates over the scribe environments for a given set of clusters and components, executing
        functions for each component

        :param clusters: The set of clusters
        :param components: The set of components
        :param callback: The callback function. Gets called with (component_name, stream_info, scribe_env, cluster)
                         The cluster field will only be set if the componenet is set to per_cluster
        """
        scribe_envs: Set[str] = set()
        for cluster in clusters:
            scribe_envs.update(
                self.determine_scribereader_envs(components, cluster))
        log.info("Would connect to these envs to tail scribe logs: %s" %
                 scribe_envs)

        for scribe_env in scribe_envs:
            # These components all get grouped in one call for backwards compatibility
            grouped_components = {'build', 'deploy', 'monitoring'}

            if any(
                [component in components for component in grouped_components]):
                stream_info = self.get_stream_info('default')
                callback('default', stream_info, scribe_env, cluster=None)

            non_defaults = set(components) - grouped_components
            for component in non_defaults:
                stream_info = self.get_stream_info(component)

                if stream_info.per_cluster:
                    for cluster in clusters:
                        callback(component,
                                 stream_info,
                                 scribe_env,
                                 cluster=cluster)
                else:
                    callback(component, stream_info, scribe_env, cluster=None)

    def get_stream_info(self, component):
        if component in self.COMPONENT_STREAM_INFO:
            return self.COMPONENT_STREAM_INFO[component]
        else:
            return self.COMPONENT_STREAM_INFO['default']

    def tail_logs(self,
                  service,
                  levels,
                  components,
                  clusters,
                  instances,
                  raw_mode=False):
        """Sergeant function for spawning off all the right log tailing functions.

        NOTE: This function spawns concurrent processes and doesn't necessarily
        worry about cleaning them up! That's because we expect to just exit the
        main process when this function returns (as main() does). Someone calling
        this function directly with something like "while True: tail_paasta_logs()"
        may be very sad.

        NOTE: We try pretty hard to supress KeyboardInterrupts to prevent big
        useless stack traces, but it turns out to be non-trivial and we fail ~10%
        of the time. We decided we could live with it and we're shipping this to
        see how it fares in real world testing.

        Here are some things we read about this problem:
        * http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool
        * http://jtushman.github.io/blog/2014/01/14/python-%7C-multiprocessing-and-interrupts/
        * http://bryceboe.com/2010/08/26/python-multiprocessing-and-keyboardinterrupt/

        We could also try harder to terminate processes from more places. We could
        use process.join() to ensure things have a chance to die. We punted these
        things.

        It's possible this whole multiprocessing strategy is wrong-headed. If you
        are reading this code to curse whoever wrote it, see discussion in
        PAASTA-214 and https://reviewboard.yelpcorp.com/r/87320/ and feel free to
        implement one of the other options.
        """
        queue = Queue()
        spawned_processes = []

        def callback(component, stream_info, scribe_env, cluster):
            kw = {
                'scribe_env': scribe_env,
                'service': service,
                'levels': levels,
                'components': components,
                'clusters': clusters,
                'instances': instances,
                'queue': queue,
                'filter_fn': stream_info.filter_fn,
            }

            if stream_info.per_cluster:
                kw['stream_name'] = stream_info.stream_name_fn(
                    service, cluster)
                kw['clusters'] = [cluster]
            else:
                kw['stream_name'] = stream_info.stream_name_fn(service)
            log.debug("Running the equivalent of 'scribereader -e %s %s'" %
                      (scribe_env, kw['stream_name']))
            process = Process(target=self.scribe_tail, kwargs=kw)
            spawned_processes.append(process)
            process.start()

        self.run_code_over_scribe_envs(clusters=clusters,
                                       components=components,
                                       callback=callback)

        # Pull things off the queue and output them. If any thread dies we are no
        # longer presenting the user with the full picture so we quit.
        #
        # This is convenient for testing, where a fake scribe_tail() can emit a
        # fake log and exit. Without the thread aliveness check, we would just sit
        # here forever even though the threads doing the tailing are all gone.
        #
        # NOTE: A noisy tailer in one scribe_env (such that the queue never gets
        # empty) will prevent us from ever noticing that another tailer has died.
        while True:
            try:
                # This is a blocking call with a timeout for a couple reasons:
                #
                # * If the queue is empty and we get_nowait(), we loop very tightly
                # and accomplish nothing.
                #
                # * Testing revealed a race condition where print_log() is called
                # and even prints its message, but this action isn't recorded on
                # the patched-in print_log(). This resulted in test flakes. A short
                # timeout seems to soothe this behavior: running this test 10 times
                # with a timeout of 0.0 resulted in 2 failures; running it with a
                # timeout of 0.1 resulted in 0 failures.
                #
                # * There's a race where thread1 emits its log line and exits
                # before thread2 has a chance to do anything, causing us to bail
                # out via the Queue Empty and thread aliveness check.
                #
                # We've decided to live with this for now and see if it's really a
                # problem. The threads in test code exit pretty much immediately
                # and a short timeout has been enough to ensure correct behavior
                # there, so IRL with longer start-up times for each thread this
                # will surely be fine.
                #
                # UPDATE: Actually this is leading to a test failure rate of about
                # 1/10 even with timeout of 1s. I'm adding a sleep to the threads
                # in test code to smooth this out, then pulling the trigger on
                # moving that test to integration land where it belongs.
                line = queue.get(True, 0.1)
                print_log(line, levels, raw_mode)
            except Empty:
                try:
                    # If there's nothing in the queue, take this opportunity to make
                    # sure all the tailers are still running.
                    running_processes = [
                        tt.is_alive() for tt in spawned_processes
                    ]
                    if not running_processes or not all(running_processes):
                        log.warn(
                            'Quitting because I expected %d log tailers to be alive but only %d are alive.'
                            % (
                                len(spawned_processes),
                                running_processes.count(True),
                            ))
                        for process in spawned_processes:
                            if process.is_alive():
                                process.terminate()
                        break
                except KeyboardInterrupt:
                    # Die peacefully rather than printing N threads worth of stack
                    # traces.
                    #
                    # This extra nested catch is because it's pretty easy to be in
                    # the above try block when the user hits Ctrl-C which otherwise
                    # dumps a stack trace.
                    log.warn('Terminating.')
                    break
            except KeyboardInterrupt:
                # Die peacefully rather than printing N threads worth of stack
                # traces.
                log.warn('Terminating.')
                break

    def print_logs_by_time(self, service, start_time, end_time, levels,
                           components, clusters, instances, raw_mode):
        aggregated_logs: List[Dict[str, Any]] = []

        if 'marathon' in components or 'chronos' in components:
            paasta_print(
                PaastaColors.red(
                    "Warning, you have chosen to get marathon or chronos logs based "
                    "on time. This command may take a dozen minutes or so to run "
                    "because marathon and chronos are on shared streams.\n", ),
                file=sys.stderr,
            )

        def callback(component, stream_info, scribe_env, cluster):
            if stream_info.per_cluster:
                stream_name = stream_info.stream_name_fn(service, cluster)
            else:
                stream_name = stream_info.stream_name_fn(service)

            ctx = self.scribe_get_from_time(scribe_env, stream_name,
                                            start_time, end_time)
            self.filter_and_aggregate_scribe_logs(
                scribe_reader_ctx=ctx,
                scribe_env=scribe_env,
                stream_name=stream_name,
                levels=levels,
                service=service,
                components=components,
                clusters=clusters,
                instances=instances,
                aggregated_logs=aggregated_logs,
                filter_fn=stream_info.filter_fn,
                parser_fn=stream_info.parse_fn,
                start_time=start_time,
                end_time=end_time,
            )

        self.run_code_over_scribe_envs(
            clusters=clusters,
            components=components,
            callback=callback,
        )

        aggregated_logs.sort(key=lambda log_line: log_line['sort_key'])
        for line in aggregated_logs:
            print_log(line['raw_line'], levels, raw_mode)

    def print_last_n_logs(self, service, line_count, levels, components,
                          clusters, instances, raw_mode):
        aggregated_logs: List[Dict[str, Any]] = []

        def callback(component, stream_info, scribe_env, cluster):
            stream_info = self.get_stream_info(component)

            if stream_info.per_cluster:
                stream_name = stream_info.stream_name_fn(service, cluster)
            else:
                stream_name = stream_info.stream_name_fn(service)

            ctx = self.scribe_get_last_n_lines(scribe_env, stream_name,
                                               line_count)
            self.filter_and_aggregate_scribe_logs(
                scribe_reader_ctx=ctx,
                scribe_env=scribe_env,
                stream_name=stream_name,
                levels=levels,
                service=service,
                components=components,
                clusters=clusters,
                instances=instances,
                aggregated_logs=aggregated_logs,
                filter_fn=stream_info.filter_fn,
                parser_fn=stream_info.parse_fn,
            )

        self.run_code_over_scribe_envs(clusters=clusters,
                                       components=components,
                                       callback=callback)
        aggregated_logs.sort(key=lambda log_line: log_line['sort_key'])
        for line in aggregated_logs:
            print_log(line['raw_line'], levels, raw_mode)

    def filter_and_aggregate_scribe_logs(
        self,
        scribe_reader_ctx,
        scribe_env,
        stream_name,
        levels,
        service,
        components,
        clusters,
        instances,
        aggregated_logs,
        parser_fn=None,
        filter_fn=None,
        start_time=None,
        end_time=None,
    ):
        with scribe_reader_ctx as scribe_reader:
            try:
                for line in scribe_reader:
                    if parser_fn:
                        line = parser_fn(line, clusters, service)
                    if filter_fn:
                        if filter_fn(
                                line,
                                levels,
                                service,
                                components,
                                clusters,
                                instances,
                                start_time=start_time,
                                end_time=end_time,
                        ):
                            try:
                                parsed_line = json.loads(line)
                                timestamp = isodate.parse_datetime(
                                    parsed_line.get('timestamp'))
                                if not timestamp.tzinfo:
                                    timestamp = pytz.utc.localize(timestamp)
                            except ValueError:
                                timestamp = pytz.utc.localize(
                                    datetime.datetime.min)

                            line = {'raw_line': line, 'sort_key': timestamp}
                            aggregated_logs.append(line)
            except StreamTailerSetupError as e:
                if 'No data in stream' in str(e):
                    log.warning("Scribe stream %s is empty on %s" %
                                (stream_name, scribe_env))
                    log.warning(
                        "Don't Panic! This may or may not be a problem depending on if you expect there to be"
                    )
                    log.warning("output within this stream.")
                else:
                    raise

    def scribe_get_from_time(self, scribe_env, stream_name, start_time,
                             end_time):
        # Scribe connection details
        host_and_port = scribereader.get_env_scribe_host(scribe_env, True)
        host = host_and_port['host']
        port = host_and_port['port']

        # Annoyingly enough, scribe needs special handling if we're trying to retrieve logs from today.

        # There is a datetime.date.today but we can't use it because it isn't UTC
        today = datetime.datetime.utcnow().date()
        if start_time.date() == today or end_time.date() == today:
            # The reason we need a fake context here is because scribereader is a bit incosistent in its
            # returns. get_stream_reader returns a context that needs to be acquired for cleanup code but
            # get_stream_tailer simply returns an object that can be iterated over. We'd still like to have
            # the cleanup code for get_stream_reader to be executed by this function's caller and this is
            # one of the simpler ways to achieve it without having 2 if statements everywhere that calls
            # this method
            @contextmanager
            def fake_context():
                log.info(
                    "Running the equivalent of 'scribereader -f -e %s %s" %
                    (scribe_env, stream_name))
                yield scribereader.get_stream_tailer(stream_name, host, port,
                                                     True, -1)

            return fake_context()

        log.info("Running the equivalent of 'scribereader -e %s %s" %
                 (scribe_env, stream_name))
        return scribereader.get_stream_reader(stream_name, host, port,
                                              start_time, end_time)

    def scribe_get_last_n_lines(self, scribe_env, stream_name, line_count):
        # Scribe connection details
        host_and_port = scribereader.get_env_scribe_host(scribe_env, True)
        host = host_and_port['host']
        port = host_and_port['port']

        # Please read comment above in scribe_get_from_time as to why a fake
        # context here is necessary
        @contextmanager
        def fake_context():
            yield scribereader.get_stream_tailer(stream_name, host, port, True,
                                                 line_count)

        return fake_context()

    def scribe_tail(
        self,
        scribe_env,
        stream_name,
        service,
        levels,
        components,
        clusters,
        instances,
        queue,
        filter_fn,
        parse_fn=None,
    ):
        """Creates a scribetailer for a particular environment.

        When it encounters a line that it should report, it sticks it into the
        provided queue.

        This code is designed to run in a thread as spawned by tail_paasta_logs().
        """
        try:
            log.debug("Going to tail %s scribe stream in %s" %
                      (stream_name, scribe_env))
            host_and_port = scribereader.get_env_scribe_host(scribe_env, True)
            host = host_and_port['host']
            port = host_and_port['port']
            tailer = scribereader.get_stream_tailer(stream_name, host, port)
            for line in tailer:
                if parse_fn:
                    line = parse_fn(line, clusters, service)
                if filter_fn(line, levels, service, components, clusters,
                             instances):
                    queue.put(line)
        except KeyboardInterrupt:
            # Die peacefully rather than printing N threads worth of stack
            # traces.
            pass
        except StreamTailerSetupError as e:
            if 'No data in stream' in str(e):
                log.warning("Scribe stream %s is empty on %s" %
                            (stream_name, scribe_env))
                log.warning(
                    "Don't Panic! This may or may not be a problem depending on if you expect there to be"
                )
                log.warning("output within this stream.")
                # Enter a wait so the process isn't considered dead.
                # This is just a large number, since apparently some python interpreters
                # don't like being passed sys.maxsize.
                sleep(2**16)
            else:
                raise

    def determine_scribereader_envs(self, components, cluster):
        """Returns a list of environments that scribereader needs to connect
        to based on a given list of components and the cluster involved.

        Some components are in certain environments, regardless of the cluster.
        Some clusters do not match up with the scribe environment names, so
        we figure that out here"""
        envs: List[str] = []
        for component in components:
            # If a component has a 'source_env', we use that
            # otherwise we lookup what scribe env is associated with a given cluster
            env = LOG_COMPONENTS[component].get(
                'source_env', self.cluster_to_scribe_env(cluster))
            if 'additional_source_envs' in LOG_COMPONENTS[component]:
                envs += LOG_COMPONENTS[component]['additional_source_envs']
            envs.append(env)
        return set(envs)

    def cluster_to_scribe_env(self, cluster):
        """Looks up the particular scribe env associated with a given paasta cluster.

        Scribe has its own "environment" key, which doesn't always map 1:1 with our
        cluster names, so we have to maintain a manual mapping.

        This mapping is deployed as a config file via puppet as part of the public
        config deployed to every server.
        """
        env = self.cluster_map.get(cluster, None)
        if env is None:
            paasta_print("I don't know where scribe logs for %s live?" %
                         cluster)
            sys.exit(1)
        else:
            return env
示例#5
0
def test_get_log_name_for_service():
    service = "foo"
    expected = "stream_paasta_%s" % service
    assert utils.get_log_name_for_service(service) == expected