Python get_siftnet_features примеры использования

Язык программирования: Python

Пространство имен/Пакет: proj5_code.feature_matching.SIFTNet

Метод/Функция: get_siftnet_features

Примеров на hotexamples.com: 8

Python get_siftnet_features - 8 примеров найдено. Это лучшие примеры Python кода для proj5_code.feature_matching.SIFTNet.get_siftnet_features, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Пример #1

Показать файл

Файл: student_code.py Проект: heejoojin/cs4476_computer_vision

def get_bags_of_sifts(image_arrays, vocabulary, stride=5):
    """
    You will want to construct SIFT features here in the same way you
    did in build_vocabulary() (except for possibly changing the sampling
    rate) and then assign each local feature to its nearest cluster center
    and build a histogram indicating how many times each cluster was used.
    Don't forget to normalize the histogram, or else a larger image with more
    SIFT features will look very different from a smaller version of the same
    image.

    Useful functions:
    -   torch.from_numpy(img_array) for converting a numpy array to a torch
    tensor for siftnet. 
    -   torch.view() for reshaping the torch tensor
    -   use torch.type() or np.array(img_array).astype() for typecasting
    -   generate_sample_points() from utils.py for sampling interest points
    -   get_siftnet_features() from SIFTNet: you can pass in the image
    tensor in grayscale, together with the sampled x and y positions to
    obtain the SIFT features
    -   np.histogram() : easy way to help you calculate for a particular
    image, how is the visual words span across the vocab. Check https://numpy.org/doc/stable/reference/generated/numpy.histogram.html for examples on how to use a histogram on an input array.
    -   np.linalg.norm() for normalizing the histogram


    Useful note:
    - You will first need to convert each array in image_arrays into a float type array. Then convert each of these image arrays into a 4-D torch tensor by using torch.from_numpy(img_array) and then reshaping it to (1 x 1 x H x W) where H and W are the image height and width. You can use tensor views i.e torch.view to do the reshaping.

    Args:
    -   image_arrays: A list of N PIL Image objects
    -   vocabulary: A numpy array of dimensions: vocab_size x 128 where each
    row is a kmeans centroid or visual word.
    -   stride: same functionality as the stride in build_vocabulary().

    Returns:
    -   image_feats: N x d matrix, where d is the dimensionality of the
    feature representation. In this case, d will equal the number of
    clusters or equivalently the number of entries in each image's histogram
    (vocab_size) below.
    """
    # load vocabulary
    vocab = vocabulary

    vocab_size = len(vocab)
    num_images = len(image_arrays)

    feats = np.zeros((num_images, vocab_size))

    #############################################################################
    # TODO: YOUR CODE HERE
    #############################################################################

    for i in range(len(image_arrays)):

        xv, yv = generate_sample_points(image_arrays[i].shape[0],
                                        image_arrays[i].shape[1], stride)
        img = torch.Tensor(image_arrays[i]).view(
            (1, 1, image_arrays[i].shape[0], image_arrays[i].shape[1]))
        img = img.type(torch.float32)
        sift = get_siftnet_features(img, xv, yv)  # sift = np array
        indices = kmeans_quantize(sift, vocab)

        # counts, _ = np.histogram(indices, np.arange(vocab_size + 1))
        # feats[i] = counts
        for idx in indices:
            feats[i, idx] += 1

        feats[i, :] = feats[i, :] / np.linalg.norm(feats[i, :])

    # feats = np.linalg.norm(feats)

    #############################################################################
    #                             END OF YOUR CODE
    #############################################################################

    return feats

Пример #2

Показать файл

def get_bags_of_sifts(image_arrays, vocabulary, step_size=10):
    """
    This feature representation is described in the lecture materials,
    and Szeliski chapter 14.
    You will want to construct SIFT features here in the same way you
    did in build_vocabulary() (except for possibly changing the sampling
    rate) and then assign each local feature to its nearest cluster center
    and build a histogram indicating how many times each cluster was used.
    Don't forget to normalize the histogram, or else a larger image with more
    SIFT features will look very different from a smaller version of the same
    image.

    Useful functions:
    -  np.array(img, dtype='float32'), torch.from_numpy(img_array), and
            img_tensor = img_tensor.reshape(
                (1, 1, img_array.shape[0], img_array.shape[1]))
            for converting a numpy array to a torch tensor for siftnet
    -   get_siftnet_features() from SIFTNet: you can pass in the image tensor
            in grayscale, together with the sampled x and y positions to obtain
            the SIFT features
    -   np.histogram() or np.bincount(): easy way to help you calculate for a
            particular image, how is the visual words span across the vocab


    Args:
    -   image_arrays: A list of input images in Numpy array, in grayscale
    -   vocabulary: A numpy array of dimensions:
            vocab_size x 128 where each row is a kmeans centroid
            or visual word.
    -   step_size: same functionality as the stride in build_vocabulary(). Feel
            free to experiment with different values, but the rationale is that
            you may want to set it smaller than stride in build_vocabulary()
            such that you collect more features from the image.

    Returns:
    -   image_feats: N x d matrix, where d is the dimensionality of the
            feature representation. In this case, d will be equal to the number
            of clusters or equivalently the number of entries in each image's
            histogram (vocab_size) below.
    """
    # load vocabulary
    vocab = vocabulary
    vocab_size = len(vocab)
    num_images = len(image_arrays)
    feats = np.zeros((num_images, vocab_size))
    sift_vectors = np.zeros(128)
    for i, image in enumerate(image_arrays):
        img_h, img_w = image.shape
        image = np.array(image, dtype='float32')
        image_tensor = torch.from_numpy(image)
        image_tensor = image_tensor.reshape((1, 1, img_h, img_w))
        h = np.arange(10, img_h - 10, step=step_size)
        w = np.arange(10, img_w - 10, step=step_size)
        grid_y, grid_x = np.meshgrid(h, w)
        dim2_idxs, dim3_idxs = grid_y.flatten(), grid_x.flatten()
        sift_feats = get_siftnet_features(image_tensor, dim3_idxs,
                                          dim2_idxs)  # K, 128
        indices = kmeans_quantize(sift_feats, vocab)
        hist, bin_edges = np.histogram(indices, bins=np.arange(vocab_size + 1))
        feats[i] = hist
    return feats

Пример #3

Показать файл

Файл: student_code.py Проект: heejoojin/cs4476_computer_vision

def build_vocabulary(image_arrays, vocab_size=50, stride=20, max_iter=10):
    """
    This function will generate the vocabulary which will be further used
    for bag of words classification.

    To generate the vocab you first randomly sample features from the
    training set. Get SIFT features for the images using
    get_siftnet_features() method. This method takes as input the image
    tensor and x and y coordinates as arguments.
    Now cluster sampled features from all images using kmeans method
    implemented by you previously and return the vocabulary of words i.e.
    cluster centers.

    Points to note:
    *   To save computation time, you don't necessarily need to
    sample from all images, although it would be better to do so.
    *   Sample the descriptors from each image to save memory and
    speed up the clustering.
    *   For testing, you may experiment with larger
    stride so you just compute fewer points and check the result quickly.
    *   The default vocab_size of 50 is sufficient for you to get a
    decent accuracy (>40%), but you are free to experiment with other values.

    Useful functions:
    -   torch.from_numpy(img_array) for converting a numpy array to a torch
    tensor for siftnet
    -   torch.view() for reshaping the torch tensor
    -   use torch.type() or np.array(img_array).astype() for typecasting
    -   generate_sample_points() from utils.py for sampling interest points

    Useful note:
    - You will first need to convert each array in image_arrays into a float type array. Then convert each of these image arrays into a 4-D torch tensor by using torch.from_numpy(img_array) and then reshaping it to (1 x 1 x H x W) where H and W are the image height and width. You can use tensor views i.e torch.view to do the reshaping.

    Args:
    -   image_arrays: list of images in Numpy arrays, in grayscale
    -   vocab_size: size of vocabulary
    -   stride: the stride of your SIFT sampling

    Returns:
    -   vocab: This is a (vocab_size, dim) Numpy array (vocabulary). Where
    dim is the length of your SIFT descriptor. Each row is a cluster
    center/visual word.
    """

    dim = 128  # length of the SIFT descriptors that you are going to compute.
    vocab = None

    #############################################################################
    # TODO: YOUR CODE HERE
    #############################################################################

    vocab = np.zeros((vocab_size, dim))
    total = np.zeros((8, dim))

    # for _ in range(max_iter):
    for i in range(len(image_arrays)):
        xv, yv = generate_sample_points(image_arrays[i].shape[0],
                                        image_arrays[i].shape[1], stride)
        img = torch.Tensor(image_arrays[i]).view(
            (1, 1, image_arrays[i].shape[0], image_arrays[i].shape[1]))
        img = img.type(torch.float32)
        sift = get_siftnet_features(img, xv, yv)  # sift = np array
        if i == 0:
            total = sift
        else:
            total = np.concatenate((total, sift), axis=0)
    vocab = kmeans(total, vocab_size)

    #############################################################################
    #                             END OF YOUR CODE
    #############################################################################

    return vocab

Пример #4

Показать файл

def build_vocabulary(image_arrays, vocab_size, stride=20):
    """
    This function will sample SIFT descriptors from the training images,
    cluster them with kmeans, and then return the cluster centers.

    Load images from the training set. To save computation time, you don't
    necessarily need to sample from all images, although it would be better
    to do so. You can randomly sample the descriptors from each image to save
    memory and speed up the clustering. For testing, you may experiment with
    larger stride so you just compute fewer points and check the result quickly.

    In order to pass the unit test, leave out a 10-pixel margin in the image,
    that is, start your x and y from 10, and stop at len(image_width) - 10 and
    len(image_height) - 10.

    For each loaded image, get some SIFT features. You don't have to get as
    many SIFT features as you will in get_bags_of_sifts, because you're only
    trying to get a representative sample here.

    Once you have tens of thousands of SIFT features from many training
    images, cluster them with kmeans. The resulting centroids are now your
    visual word vocabulary.

    Note that the default vocab_size of 50 is sufficient for you to get a decent
    accuracy (>40%), but you are free to experiment with other values.

    Useful functions:
    -   np.array(img, dtype='float32'), torch.from_numpy(img_array), and
            img_tensor = img_tensor.reshape(
                (1, 1, img_array.shape[0], img_array.shape[1]))
            for converting a numpy array to a torch tensor for siftnet
    -   get_siftnet_features() from SIFTNet: you can pass in the image tensor in
            grayscale, together with the sampled x and y positions to obtain the
            SIFT features
    -   np.arange() and np.meshgrid(): for you to generate the sample x and y
            positions faster

    Args:
    -   image_arrays: list of images in Numpy arrays, in grayscale
    -   vocab_size: size of vocabulary
    -   stride: the stride of your SIFT sampling

    Returns:
    -   vocab: This is a (vocab_size, dim) Numpy array (vocabulary). Where dim
            is the length of your SIFT descriptor. Each row is a cluster center
            / visual word.
    """

    dim = 128  # length of the SIFT descriptors that you are going to compute.
    vocab = np.zeros((vocab_size, dim))
    sift_vectors = np.zeros(dim)
    for image in image_arrays:
        img_h, img_w = image.shape
        image = np.array(image, dtype='float32')
        image_tensor = torch.from_numpy(image)
        image_tensor = image_tensor.reshape((1, 1, img_h, img_w))
        h = np.arange(10, img_h - 10, step=stride)
        w = np.arange(10, img_w - 10, step=stride)
        grid_y, grid_x = np.meshgrid(h, w)
        dim2_idxs, dim3_idxs = grid_y.flatten(), grid_x.flatten()
        feats = get_siftnet_features(image_tensor, dim3_idxs, dim2_idxs)
        sift_vectors = np.vstack((sift_vectors, feats))
    vocab = kmeans(sift_vectors[1:], vocab_size)
    return vocab

Пример #5

Показать файл

def get_bags_of_sifts(image_arrays, vocabulary, step_size=10):
    """
    This feature representation is described in the lecture materials,
    and Szeliski chapter 14.
    You will want to construct SIFT features here in the same way you
    did in build_vocabulary() (except for possibly changing the sampling
    rate) and then assign each local feature to its nearest cluster center
    and build a histogram indicating how many times each cluster was used.
    Don't forget to normalize the histogram, or else a larger image with more
    SIFT features will look very different from a smaller version of the same
    image.

    Useful functions:
    -  np.array(img, dtype='float32'), torch.from_numpy(img_array), and
            img_tensor = img_tensor.reshape(
                (1, 1, img_array.shape[0], img_array.shape[1]))
            for converting a numpy array to a torch tensor for siftnet
    -   get_siftnet_features() from SIFTNet: you can pass in the image tensor
            in grayscale, together with the sampled x and y positions to obtain
            the SIFT features
    -   np.histogram() or np.bincount(): easy way to help you calculate for a
            particular image, how is the visual words span across the vocab


    Args:
    -   image_arrays: A list of input images in Numpy array, in grayscale
    -   vocabulary: A numpy array of dimensions:
            vocab_size x 128 where each row is a kmeans centroid
            or visual word.
    -   step_size: same functionality as the stride in build_vocabulary(). Feel
            free to experiment with different values, but the rationale is that
            you may want to set it smaller than stride in build_vocabulary()
            such that you collect more features from the image.

    Returns:
    -   image_feats: N x d matrix, where d is the dimensionality of the
            feature representation. In this case, d will be equal to the number
            of clusters or equivalently the number of entries in each image's
            histogram (vocab_size) below.
    """
    # load vocabulary
    vocab = vocabulary

    vocab_size = len(vocab)
    num_images = len(image_arrays)

    feats = np.zeros((num_images, vocab_size))

    ###########################################################################
    # TODO: YOUR CODE HERE                                                    #
    ###########################################################################

    # print("START")
    # print(len(image_arrays))
    # counter = 0
    features = [None] * len(image_arrays)
    for i in range(len(image_arrays)):
        # print("img")
        img = image_arrays[i]
        # counter += 1
        # if counter % 100 == 0:
        # print("PROGRESS")
        img_array = np.array(img, dtype='float32')
        img_tensor = torch.from_numpy(img_array)
        img_final = img_tensor.reshape(1, 1, img.shape[0], img.shape[1])
        xv, yv = np.meshgrid(np.arange(10, img.shape[0] - 10, step_size),
                             np.arange(10, img.shape[1] - 10, step_size))
        sift_features = get_siftnet_features(img_final, yv.flatten(),
                                             xv.flatten())
        indices = kmeans_quantize(
            sift_features,
            vocab)  # For each feature, find the centroid it's closest to
        bins = np.bincount(indices, minlength=vocab_size)
        bins = bins / np.linalg.norm(bins)
        features[i] = bins
    feats = np.array(features)
    # print("DONE")

    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return feats

Пример #6

Показать файл

def build_vocabulary(image_arrays, vocab_size, stride=20):
    """
    This function will sample SIFT descriptors from the training images,
    cluster them with kmeans, and then return the cluster centers.

    Load images from the training set. To save computation time, you don't
    necessarily need to sample from all images, although it would be better
    to do so. You can randomly sample the descriptors from each image to save
    memory and speed up the clustering. For testing, you may experiment with
    larger stride so you just compute fewer points and check the result quickly.

    In order to pass the unit test, leave out a 10-pixel margin in the image,
    that is, start your x and y from 10, and stop at len(image_width) - 10 and
    len(image_height) - 10.

    For each loaded image, get some SIFT features. You don't have to get as
    many SIFT features as you will in get_bags_of_sifts, because you're only
    trying to get a representative sample here.

    Once you have tens of thousands of SIFT features from many training
    images, cluster them with kmeans. The resulting centroids are now your
    visual word vocabulary.

    Note that the default vocab_size of 50 is sufficient for you to get a decent
    accuracy (>40%), but you are free to experiment with other values.

    Useful functions:
    -   np.array(img, dtype='float32'), torch.from_numpy(img_array), and
            img_tensor = img_tensor.reshape(
                (1, 1, img_array.shape[0], img_array.shape[1]))
            for converting a numpy array to a torch tensor for siftnet
    -   get_siftnet_features() from SIFTNet: you can pass in the image tensor in
            grayscale, together with the sampled x and y positions to obtain the
            SIFT features
    -   np.arange() and np.meshgrid(): for you to generate the sample x and y
            positions faster

    Args:
    -   image_arrays: list of images in Numpy arrays, in grayscale
    -   vocab_size: size of vocabulary
    -   stride: the stride of your SIFT sampling

    Returns:
    -   vocab: This is a (vocab_size, dim) Numpy array (vocabulary). Where dim
            is the length of your SIFT descriptor. Each row is a cluster center
            / visual word.
    """

    dim = 128  # length of the SIFT descriptors that you are going to compute.
    vocab = None

    ###########################################################################
    # TODO: YOUR CODE HERE                                                    #
    ###########################################################################

    # print(len(image_arrays))
    features = np.array([])
    for img in image_arrays:
        img_array = np.array(img, dtype='float32')
        img_tensor = torch.from_numpy(img_array)
        img_final = img_tensor.reshape(1, 1, img.shape[0], img.shape[1])
        # sift_features = get_siftnet_features(img_final, np.arange(10, img.shape[0]-10, 20), np.arange(10, img.shape[1]-10, 20))
        xv, yv = np.meshgrid(np.arange(10, img.shape[0] - 10, stride),
                             np.arange(10, img.shape[1] - 10, stride))
        sift_features = get_siftnet_features(img_final, yv.flatten(),
                                             xv.flatten())
        if len(features) == 0:
            features = sift_features
        else:
            features = np.concatenate((features, sift_features))
    # print("HALFWAY")
    vocab = kmeans(features, vocab_size)
    # print("vocab shape: ", vocab.shape)

    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return vocab

Пример #7

Показать файл

Файл: student_code.py Проект: sebhollister/Computer-Vision

def get_bags_of_sifts(image_arrays, vocabulary, step_size = 10):
    """
    This feature representation is described in the lecture materials,
    and Szeliski chapter 14.
    You will want to construct SIFT features here in the same way you
    did in build_vocabulary() (except for possibly changing the sampling
    rate) and then assign each local feature to its nearest cluster center
    and build a histogram indicating how many times each cluster was used.
    Don't forget to normalize the histogram, or else a larger image with more
    SIFT features will look very different from a smaller version of the same
    image.

    Useful functions:
    -  np.array(img, dtype='float32'), torch.from_numpy(img_array), and
            img_tensor = img_tensor.reshape(
                (1, 1, img_array.shape[0], img_array.shape[1]))
            for converting a numpy array to a torch tensor for siftnet
    -   get_siftnet_features() from SIFTNet: you can pass in the image tensor
            in grayscale, together with the sampled x and y positions to obtain
            the SIFT features
    -   np.histogram() or np.bincount(): easy way to help you calculate for a
            particular image, how is the visual words span across the vocab


    Args:
    -   image_arrays: A list of input images in Numpy array, in grayscale
    -   vocabulary: A numpy array of dimensions:
            vocab_size x 128 where each row is a kmeans centroid
            or visual word.
    -   step_size: same functionality as the stride in build_vocabulary(). Feel
            free to experiment with different values, but the rationale is that
            you may want to set it smaller than stride in build_vocabulary()
            such that you collect more features from the image.

    Returns:
    -   image_feats: N x d matrix, where d is the dimensionality of the
            feature representation. In this case, d will be equal to the number
            of clusters or equivalently the number of entries in each image's
            histogram (vocab_size) below.
    """
    # load vocabulary
    vocab = vocabulary

    vocab_size = len(vocab)
    num_images = len(image_arrays)

    feats = np.zeros((num_images, vocab_size))

    ###########################################################################
    # TODO: YOUR CODE HERE                                                    #
    ###########################################################################
    import time
    start = time.time()
    img_feats = []
    idx = 0
    stride = step_size
    for img in image_arrays:
        #img_width = img.shape[0]
        #img_height = img.shape[1]
        img_array = np.array(img, dtype='float32')
        img_width = img.shape[1]
        img_height = img.shape[0]
        img_tensor = torch.from_numpy(img_array)
        img_tensor = img_tensor.reshape((1, 1, img.shape[0], img.shape[1]))
    
        x_s = np.arange(10, img_width - 10, stride)
        y_s = np.arange(10, img_height - 10, stride)

        x, y = np.meshgrid(x_s, y_s)
        x = x.flatten()
        y = y.flatten()
        x_length = x.shape[0]
        #print(x_length)
        #sift_feats.append(np.array( get_siftnet_features(img_tensor, x, y)))
        """if idx == 0:
            sift_feats= np.array( get_siftnet_features(img_tensor, x, y))
            img_feats.append(sift_feats)
        else:
            new_feats = np.array( get_siftnet_features(img_tensor, x, y))
            img_feats.append(new_feats)
            sift_feats = np.concatenate((sift_feats, new_feats))
        idx = idx + 1"""
        centroids = vocabulary
        sift_feats = np.array(get_siftnet_features(img_tensor, x, y))
        quantized = kmeans_quantize(sift_feats, centroids)
        bins = np.arange(len(vocab) + 1)
        hist = np.histogram(quantized, bins)[0]
        normed_hist = np.linalg.norm(hist)
        hist = np.divide(hist, normed_hist)
        feats[idx] = hist
        idx = idx + 1

Пример #8

Показать файл

Файл: student_code.py Проект: sebhollister/Computer-Vision

def build_vocabulary(image_arrays, vocab_size, stride = 20):
    """
    This function will sample SIFT descriptors from the training images,
    cluster them with kmeans, and then return the cluster centers.

    Load images from the training set. To save computation time, you don't
    necessarily need to sample from all images, although it would be better
    to do so. You can randomly sample the descriptors from each image to save
    memory and speed up the clustering. For testing, you may experiment with
    larger stride so you just compute fewer points and check the result quickly.

    In order to pass the unit test, leave out a 10-pixel margin in the image,
    that is, start your x and y from 10, and stop at len(image_width) - 10 and
    len(image_height) - 10.

    For each loaded image, get some SIFT features. You don't have to get as
    many SIFT features as you will in get_bags_of_sifts, because you're only
    trying to get a representative sample here.

    Once you have tens of thousands of SIFT features from many training
    images, cluster them with kmeans. The resulting centroids are now your
    visual word vocabulary.

    Note that the default vocab_size of 50 is sufficient for you to get a decent
    accuracy (>40%), but you are free to experiment with other values.

    Useful functions:
    -   np.array(img, dtype='float32'), torch.from_numpy(img_array), and
            img_tensor = img_tensor.reshape(
                (1, 1, img_array.shape[0], img_array.shape[1]))
            for converting a numpy array to a torch tensor for siftnet
    -   get_siftnet_features() from SIFTNet: you can pass in the image tensor in
            grayscale, together with the sampled x and y positions to obtain the
            SIFT features
    -   np.arange() and np.meshgrid(): for you to generate the sample x and y
            positions faster

    Args:
    -   image_arrays: list of images in Numpy arrays, in grayscale
    -   vocab_size: size of vocabulary
    -   stride: the stride of your SIFT sampling

    Returns:
    -   vocab: This is a (vocab_size, dim) Numpy array (vocabulary). Where dim
            is the length of your SIFT descriptor. Each row is a cluster center
            / visual word.
    """
    import time
    start = time.time()
    dim = 128  # length of the SIFT descriptors that you are going to compute.
    vocab = None

    ###########################################################################
    # TODO: YOUR CODE HERE                                                    #
    ###########################################################################
    
    #10, and stop at len(image_width) - 10
    #len(image_height) - 10
    # 1. For each image in image array, convert to tensor. Reshape
    # 2. Loop through image with correct bounds and stride, get_sift_features()
    # 3. Add all sift features to np.array called sift_feats
    # 4. Run K means on sift feats, the vocab words are the centroids
    sift_feats = []
    
    x_length = 0
    idx = 0
    for img in image_arrays:
        #img_width = img.shape[0]
        #img_height = img.shape[1]
        img_array = np.array(img, dtype='float32')
        img_width = img.shape[1]
        img_height = img.shape[0]
        img_tensor = torch.from_numpy(img_array)
        img_tensor = img_tensor.reshape((1, 1, img.shape[0], img.shape[1]))
    
        x_s = np.arange(10, img_width - 10, stride)
        y_s = np.arange(10, img_height - 10, stride)

        x, y = np.meshgrid(x_s, y_s)
        x = x.flatten()
        y = y.flatten()
        x_length = x.shape[0]
        #sift_feats.append(np.array( get_siftnet_features(img_tensor, x, y)))
        if idx == 0:
            sift_feats= np.array( get_siftnet_features(img_tensor, x, y))
        else:
            new_feats = np.array( get_siftnet_features(img_tensor, x, y))
            sift_feats = np.concatenate((sift_feats, new_feats))
        idx = idx + 1

        
        #for f in sift_features:
        #   sift_feats.append(f)
    

    #feat_array = np.array(sift_feats)
    
    #if (sift_feats.ndim > 2):
    #print(sift_feats.shape)
    #N = sift_feats.shape[0]*sift_feats.shape[1]
    #feat_array = sift_feats.reshape((N, dim))
    #N = x_length * len(image_arrays)
    #feat_array = feat_array.reshape((N, dim))
    
    #centroids = kmeans(sift_feats, len(image_arrays), max_iter = 1)
    centroids = kmeans(sift_feats, vocab_size, max_iter = 15)
    
    if len(centroids) > vocab_size:
        vocab = centroids[:vocab_size]
    else:
        vocab = centroids
    end = time.time()
    #print("Took: {} sec".format(end - start))
    #vocab = centroids
    #vocab must be size vocab size, dim
    
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return vocab