http://www.sysexpand.com/  

SysExpand .NET Code Base

Programming Exercises > Finding Kth Smallest Element

Exercise #6: Finding Kth Smallest Element in an Unsorted Array

Problem Statement

Given an unsorted array of N integers, write a function that returns Kth smallest number in the array, for zero-based K.

Example: Suppose that the following array with 5 numbers is given: 3, 1, 7, 5, 9. For K=3, the result is 7. This is because, when arry is sorted into 1, 3, 5, 7, 9, the element at zero-based position 3 is number 7.

Keywords: Kth smallest, array, median of medians, selection problem.

Problem Analysis

We could start solvong the problem by simply sorting the array and returning the element at position K. This solution produces the result in time proportional to NlogN. But is there a more efficient solution? If we stick to the sorting idea, we might ask why should we sort the whole array if only the Kth element in the sort order is requested? Let's examine the possibilities that arise when QUICKSORT algorithm is applied. Remember that this algorithm partitions the array into two disjoint sections delimited by some pivot element. Once the partitioning step is completed, the algorithm is run on each of the partitions recursively. These are the steps executed by the sorting algorithm:

1. Pick an element within current segment and call it the pivot

2. Move all elements smaller than or equal to the pivot
   near the beginning of the current segment

3. Move all elements larger than the pivot near the end
   of the current segment

4. Place the pivot on the remaining position in between

5. Recursively call the sorting algorithm on the segment
   to the left from the pivot

6. Recursively call the sorting algorithm on the segment
   to the right from the pivot

The way in which this algorithm works seems to guarantee high sorting performance. Values in the array are divided up into two, hopefully equal, sub-arrays. All values in the left partition are strictly smaller than all elements in the right partition. Consequently, the two partitions can be sorted independently from each other. The "hopefully equal" part is the tricky one, though. Depending on how the pivot is selected, we could get quite different results in terms of relative sizes of the two partitions. If we really get out of luck, pivot will be the smallest of all values in the segment (or largest, with same consequences). The segment will then be divided into an empty left segment, the pivot, and the right segment that contains all the values but the pivot. So in the worst case, the next step will be the same as the current step, only the segment will be by one element shorter. Not much of a reduction, with overall result that worst case runs in quadratic time, i.e. time proportional to square of the number of elements.

Now let's turn back to the Kth smallest element problem. We could approach it in the same way in which QUICKSORT algorithm approaches the sorting problem. First pick the pivot, partition the array and then, instead of sorting the left and the right partition further, test their sizes in order to decide which partition contains the Kth element that is being sought. If we're lucky enough, the Kth element would be the pivot - when things slide in this direction the search is over. Otherwise, Kth element must fall either into the left or into the right partition. If former is the case, then we can recursively run the search for the Kth element inside the left partition, completely ignoring the right partition. Otherwise, the element falls into the right partition and then we run the algorithm recursively only this time on the right partition. The only trick is that in the latter case we do not search for the Kth element in the following iterations any more. Instead, index of the requested element must be reduced by the size of the left partition and by one more, standing for the pivot. Below is the modified algorithm.

1. Pick an element within current segment
   and call it the pivot

2. Count elements that are smaller and
   elements that are larger than the pivot

3. If number of elements smaller than the pivot
   is larger than K, then move those elements
   to the beginning of the array and run
   the algorithm recursively only on that part of the array.

4. Otherwise, if number of elements smaller than the pivot
   plus number of elements equal to the pivot is larger
   than K, then Kth element is equal to pivot
   so just return the pivot and finish.

5. Otherwise, move all elements larger than the pivot
   to the beginning of the array and run the algorithm
   recursively only on that part of the array.

Once again, if we are lucky enough with the pivot selection, this algorithm will cut the array in half in every step, converging quickly towards the solution. But once again, poor pivot selection leads to very slow convergence. In the worst case, only the pivot will be removed for the next iteration, leading to grim prediction in which the whole algorithm will have to be repeated as many times through the array as many elements there are in the whole array - hardly that we could devise any algorithm slower than that. Success of the whole idea depends on good pivot selection. That is what we are going to discuss next.

There are many enhancements to the QUICKSORT algorithm intended to improve worst case behavior. Basically, should the pivot element be the median, we would rest assured that the array will be divided into equal halves in each iteration. One of the enhancements has actually lead to a completely new algorithm by turning the question the other way around: What is the most efficient way to discover the median of an array? The idea went like this. First we divide the array into N/C columns, each column containing C consecutive elements of the array. Note that C is a relatively small odd number (many authors suggest five elements per column). Next, we sort each column separately and pick its exact median (this is why it's important for C to be odd). After this step, we have obtained N/C medians and then we can simply run the algorithm recursively on those values only. At some point there will be only one value remaining (called median of medians), it is selected as the pivot. Further down the line, the algorithm goes the same as given above - pivot is used to split the array into values that are smaller and values that are larger than the pivot.

The only thing to worry about is what is the worst case in this setting, i.e. what is the smallest number of values that are discarded in each iteration of the algorithm. It has been proven that in each iteration of the algorithm the pivot element selected in this way discards at least one quarter of all elements still in focus. Consequently, we are never going to get into an unfair situation in which array hardly reduces between the two iterations. On the contrary, this is the proof that the array will converge towards the Kth element quite quickly - to be perfectly clear about the time requirements, it has been proven that number of comparisons required to isolate the Kth element is proportional to number of elements in the array N. This algorithm has been proposed in the early 1970s and ever since it has remained the most efficient algorithm with guaranteed linear worst case running time. Compared to quadratic worst time required by algorithms derived from QUICKSORT, it comes as no surprise that this algorithm is still in use. Below is the whole algorithm - it will sound quite simple when compared to pondering that lead to its discovery.

1. Divide the array into N/C columns of elements,
   for small odd C.

2. Find the median of each column by sorting it.

3. Take only the medians and repeat steps 1-2 recursively
   until only one value remains. That value is picked as the pivot.

4. Iterate through the array and count number of elements
   strictly smaller than the pivot (S), larger than the pivot (L)
   and equal to the pivot (E=N-S-L).

5. If N>K, move all values smaller than the pivot
   to the beginning of the array and recursively run
   the whole algorithm on that sub-array.

6. If N+E>K, conclude that Kth element is equal
   to the current pivot so return the pivot value
   and terminate the algorithm.

7. Otherwise, move all values larger than the pivot
   to the beginning of the array and recursively run
   the whole algorithm on that sub-array.

In the following section we are going to provide implementation of this algorithm in C#. The only thing that should be discussed is step 2 of the algorithm, in which medians for columns are extracted. The algorithm is designed so that simple sorting with quadratic running time is applied to each column. Performance of this step is guaranteed by the small value of C. There comes the suggestion to take C=5 - this advice was derived from the fact that five values can be stored in processor's registers, making the sort operation extremely fast in practice. Moreover, columns could be sorted only until the middle element is sorted, since any element past that position cannot be any smaller. But that optimization may cost more than it saves for small C. For C=5, it would save only one comparison to sort the last two elements in the column. In solution presented below, this last optimization is not applied.

Solution

In this section we are providing the complete source code for selecting Kth smallest integer value from an unsorted array. This implementation takes C=5 as assumed parameter. Generally, this value could also be passed to the function as an argument.

int FindKthSmallest(int[] a, int k)
{

    int value = 0;
    int n = a.Length;
    int c = 5;  // Constant used to divide the array into columns

    while (true)
    {

        // Extract median of medians and take it as the pivot
        int pivot = FindPivot(a, n, c);

        // Now count how many smaller and larger elements are there
        int smallerCount = 0;
        int largerCount = 0;
        CountElements(a, n, pivot, out smallerCount, out largerCount);

        // Finally, partition the array
        if (k < smallerCount)
        {
            Partition(a, ref n, pivot, true);
        }
        else if (k < n - largerCount)
        {
            value = pivot;
            break;
        }
        else
        {
            k -= n - largerCount;
            Partition(a, ref n, pivot, false);
        }

    }

    return value;

}

int FindPivot(int[] a, int n, int c)
{

    while (n > 1)
    {

        int pos = 0;
        int tmp = 0;

        for (int start = 0; start < n; start += c)
        {

            int end = start + c;
            if (end > n)    // Last column may have
                end = n;    // less than c elements

            // Sort the column
            for (int i = start; i < end - 1; i++)
                for (int j = i + 1; j < end; j++)
                    if (a[j] < a[i])
                    {
                        tmp = a[i];
                        a[i] = a[j];
                        a[j] = tmp;
                    }

            // Pick the column's median and promote it
            // to the beginning of the array
            end = (start + end)  2;    // Median position
            tmp = a[end];
            a[end] = a[pos];
            a[pos++] = tmp;

        }

        n = pos;    // Reduce the array and repeat recursively

    }

    return a[0];    // Last median of medians is the pivot

}

void Partition(int[] a, ref int n, int pivot, bool extractSmaller)
{
    int pos = 0;
    for (int i = 0; i < n; i++)
        if ((extractSmaller && a[i] < pivot) ||
            (!extractSmaller && a[i] > pivot))
        {
            int tmp = a[i];
            a[i] = a[pos];
            a[pos++] = tmp;
        }
    n = pos;
}

Demonstration

In this section we are going to present a short console application that demonstrates the use of the FindKthSmallest function. Here is the complete source code of the console application:

using System;

namespace KthSmallest
{

    public class Program
    {

        static Random _rnd = new Random();

        static int[] GenerateArray(int n)
        {

            int[] array = new int[(Int64)n];

            for (int i = 0; i < n; i++)
                array[i] = _rnd.Next(100);

            // Now randomize the array
            for (int i = array.Length - 1; i > 0; i--)
            {
                int index = _rnd.Next(i);
                int tmp = array[index];
                array[index] = array[i];
                array[i] = tmp;
            }

            return array;

        }

        static void PrintArray(int[] array)
        {
            for (int i = 0; i < array.Length; i++)
                Console.Write("{0,4}", array[i]);
            Console.WriteLine();
        }

        static int FindKthSmallest(int[] a, int k) { ... }

        static int FindPivot(int[] a, int n, int c) { ... }

        static void CountElements(int[] a, int n, int pivot,
                                  out int smallerCount,
                                  out int largerCount) { ... }

        static void Partition(int[] a, ref int n, int pivot,
                              bool extractSmaller) { ... }

        static void Main(string[] args)
        {

            while (true)
            {

                Console.Write("n=");
                int n = int.Parse(Console.ReadLine());

                if (n <= 0)
                    break;

                Console.Write("k=");
                int k = int.Parse(Console.ReadLine());

                int[] a = GenerateArray(n);

                int value = FindKthSmallest(a, k);

                PrintArray(a);
                Console.WriteLine("{0}. smallest value is {1}", k, value);

            }

            Console.Write("Press ENTER to continue... ");
            Console.ReadLine();

        }
    }
}

And this is one possible output produced by this demonstration application:

n=10
k=4
  37  36   0  27   7  38  65  98  43  90
4. smallest value is 37
n=9
k=3
  48  48   4  63  73  43  83  81  92
3. smallest value is 48
n=7
k=6
  93  38  13  43  73  90  19
6. smallest value is 93
n=11
k=0
   3  17  41  45  52  59  64  60  64  92  82
0. smallest value is 3
n=0
Press ENTER to continue...

Follow-Up Exercises

Readers are suggested to try these extended tasks:

  • Apply the function to very large arrays containing random data (millions to hundreds of millions of integers). Measure how time required to produce the result grows with growing size of the array.
  • Try to solve the problem by using Array.Sort and picking the Kth element from the fully sorted array. Repeat the performance test to measure how sorting time increases with growing size of the array.
  • Compare absolute times obtained in previous tests and discuss the differences.

See also:

 

Published: Apr 27, 2013; Modified: Jan 22, 2014

webmasters