Exercise #6: Finding Kth Smallest Element in an Unsorted Array by Zoran Horvat @zoranh75
Problem Statement
Given an unsorted array of N integers, write a function that returns Kth
smallest number in the array, for zerobased K.
Example: Suppose that the following array with 5 numbers is given: 3, 1, 7, 5,
9. For K=3, the result is 7. This is because, when arry is sorted into 1, 3, 5,
7, 9, the element at zerobased position 3 is number 7.
Keywords: Kth smallest, array, median of medians, selection problem.
Problem Analysis
We could start solvong the problem by simply sorting the array and returning
the element at position K. This solution produces the result in time
proportional to NlogN. But is there a more efficient solution? If we stick to
the sorting idea, we might ask why should we sort the whole array if only the
Kth element in the sort order is requested? Let's examine the possibilities
that arise when QUICKSORT algorithm is applied. Remember that this algorithm
partitions the array into two disjoint sections delimited by some pivot
element. Once the partitioning step is completed, the algorithm is run on each
of the partitions recursively. These are the steps executed by the sorting
algorithm:
1. Pick an element within current segment and call it the pivot
2. Move all elements smaller than or equal to the pivot
near the beginning of the current segment
3. Move all elements larger than the pivot near the end
of the current segment
4. Place the pivot on the remaining position in between
5. Recursively call the sorting algorithm on the segment
to the left from the pivot
6. Recursively call the sorting algorithm on the segment
to the right from the pivot
The way in which this algorithm works seems to guarantee high sorting
performance. Values in the array are divided up into two, hopefully equal,
subarrays. All values in the left partition are strictly smaller than all
elements in the right partition. Consequently, the two partitions can be sorted
independently from each other. The "hopefully equal" part is the tricky one,
though. Depending on how the pivot is selected, we could get quite different
results in terms of relative sizes of the two partitions. If we really get out
of luck, pivot will be the smallest of all values in the segment (or largest,
with same consequences). The segment will then be divided into an empty left
segment, the pivot, and the right segment that contains all the values but the
pivot. So in the worst case, the next step will be the same as the current
step, only the segment will be by one element shorter. Not much of a reduction,
with overall result that worst case runs in quadratic time, i.e. time
proportional to square of the number of elements.
Now let's turn back to the Kth smallest element problem. We could approach it
in the same way in which QUICKSORT algorithm approaches the sorting problem.
First pick the pivot, partition the array and then, instead of sorting the left
and the right partition further, test their sizes in order to decide which
partition contains the Kth element that is being sought. If we're lucky enough,
the Kth element would be the pivot  when things slide in this direction the
search is over. Otherwise, Kth element must fall either into the left or into
the right partition. If former is the case, then we can recursively run the
search for the Kth element inside the left partition, completely ignoring the
right partition. Otherwise, the element falls into the right partition and then
we run the algorithm recursively only this time on the right partition. The
only trick is that in the latter case we do not search for the Kth element in
the following iterations any more. Instead, index of the requested element must
be reduced by the size of the left partition and by one more, standing for the
pivot. Below is the modified algorithm.
1. Pick an element within current segment
and call it the pivot
2. Count elements that are smaller and
elements that are larger than the pivot
3. If number of elements smaller than the pivot
is larger than K, then move those elements
to the beginning of the array and run
the algorithm recursively only on that part of the array.
4. Otherwise, if number of elements smaller than the pivot
plus number of elements equal to the pivot is larger
than K, then Kth element is equal to pivot
so just return the pivot and finish.
5. Otherwise, move all elements larger than the pivot
to the beginning of the array and run the algorithm
recursively only on that part of the array.
Once again, if we are lucky enough with the pivot selection, this algorithm
will cut the array in half in every step, converging quickly towards the
solution. But once again, poor pivot selection leads to very slow convergence.
In the worst case, only the pivot will be removed for the next iteration,
leading to grim prediction in which the whole algorithm will have to be
repeated as many times through the array as many elements there are in the
whole array  hardly that we could devise any algorithm slower than that.
Success of the whole idea depends on good pivot selection. That is what we are
going to discuss next.
There are many enhancements to the QUICKSORT algorithm intended to improve
worst case behavior. Basically, should the pivot element be the median, we
would rest assured that the array will be divided into equal halves in each
iteration. One of the enhancements has actually lead to a completely new
algorithm by turning the question the other way around: What is the most
efficient way to discover the median of an array? The idea went like this.
First we divide the array into N/C columns, each column containing C
consecutive elements of the array. Note that C is a relatively small odd number
(many authors suggest five elements per column). Next, we sort each column
separately and pick its exact median (this is why it's important for C to be
odd). After this step, we have obtained N/C medians and then we can simply run
the algorithm recursively on those values only. At some point there will be
only one value remaining (called median of medians), it is selected as the
pivot. Further down the line, the algorithm goes the same as given above 
pivot is used to split the array into values that are smaller and values that
are larger than the pivot.
The only thing to worry about is what is the worst case in this setting, i.e.
what is the smallest number of values that are discarded in each iteration of
the algorithm. It has been proven that in each iteration of the algorithm the
pivot element selected in this way discards at least one quarter of all
elements still in focus. Consequently, we are never going to get into an unfair
situation in which array hardly reduces between the two iterations. On the
contrary, this is the proof that the array will converge towards the Kth
element quite quickly  to be perfectly clear about the time requirements, it
has been proven that number of comparisons required to isolate the Kth element
is proportional to number of elements in the array N. This algorithm has been
proposed in the early 1970s and ever since it has remained the most efficient
algorithm with guaranteed linear worst case running time. Compared to quadratic
worst time required by algorithms derived from QUICKSORT, it comes as no
surprise that this algorithm is still in use. Below is the whole algorithm  it
will sound quite simple when compared to pondering that lead to its discovery.
1. Divide the array into N/C columns of elements,
for small odd C.
2. Find the median of each column by sorting it.
3. Take only the medians and repeat steps 12 recursively
until only one value remains. That value is picked as the pivot.
4. Iterate through the array and count number of elements
strictly smaller than the pivot (S), larger than the pivot (L)
and equal to the pivot (E=NSL).
5. If N>K, move all values smaller than the pivot
to the beginning of the array and recursively run
the whole algorithm on that subarray.
6. If N+E>K, conclude that Kth element is equal
to the current pivot so return the pivot value
and terminate the algorithm.
7. Otherwise, move all values larger than the pivot
to the beginning of the array and recursively run
the whole algorithm on that subarray.
In the following section we are going to provide implementation of this
algorithm in C#. The only thing that should be discussed is step 2 of the
algorithm, in which medians for columns are extracted. The algorithm is
designed so that simple sorting with quadratic running time is applied to each
column. Performance of this step is guaranteed by the small value of C. There
comes the suggestion to take C=5  this advice was derived from the fact that
five values can be stored in processor's registers, making the sort operation
extremely fast in practice. Moreover, columns could be sorted only until the
middle element is sorted, since any element past that position cannot be any
smaller. But that optimization may cost more than it saves for small C. For
C=5, it would save only one comparison to sort the last two elements in the
column. In solution presented below, this last optimization is not applied.
Solution
In this section we are providing the complete source code for selecting Kth
smallest integer value from an unsorted array. This implementation takes C=5 as
assumed parameter. Generally, this value could also be passed to the function
as an argument.
int FindKthSmallest(int[] a, int k)
{
int value = 0;
int n = a.Length;
int c = 5;
while (true)
{
int pivot = FindPivot(a, n, c);
int smallerCount = 0;
int largerCount = 0;
CountElements(a, n, pivot, out smallerCount, out largerCount);
if (k < smallerCount)
{
Partition(a, ref n, pivot, true);
}
else if (k < n  largerCount)
{
value = pivot;
break;
}
else
{
k = n  largerCount;
Partition(a, ref n, pivot, false);
}
}
return value;
}
int FindPivot(int[] a, int n, int c)
{
while (n > 1)
{
int pos = 0;
int tmp = 0;
for (int start = 0; start < n; start += c)
{
int end = start + c;
if (end > n)
end = n;
for (int i = start; i < end  1; i++)
for (int j = i + 1; j < end; j++)
if (a[j] < a[i])
{
tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
end = (start + end) / 2;
tmp = a[end];
a[end] = a[pos];
a[pos++] = tmp;
}
n = pos;
}
return a[0];
}
void Partition(int[] a, ref int n, int pivot, bool extractSmaller)
{
int pos = 0;
for (int i = 0; i < n; i++)
if ((extractSmaller && a[i] < pivot) 
(!extractSmaller && a[i] > pivot))
{
int tmp = a[i];
a[i] = a[pos];
a[pos++] = tmp;
}
n = pos;
}
Demonstration
In this section we are going to present a short console application that
demonstrates the use of the FindKthSmallest function. Here is the complete source code of the console application:
using System;
namespace KthSmallest
{
public class Program
{
static Random _rnd = new Random();
static int[] GenerateArray(int n)
{
int[] array = new int[(Int64)n];
for (int i = 0; i < n; i++)
array[i] = _rnd.Next(100);
for (int i = array.Length  1; i > 0; i)
{
int index = _rnd.Next(i);
int tmp = array[index];
array[index] = array[i];
array[i] = tmp;
}
return array;
}
static void PrintArray(int[] array)
{
for (int i = 0; i < array.Length; i++)
Console.Write("{0,4}", array[i]);
Console.WriteLine();
}
static int FindKthSmallest(int[] a, int k) { ... }
static int FindPivot(int[] a, int n, int c) { ... }
static void CountElements(int[] a, int n, int pivot,
out int smallerCount,
out int largerCount) { ... }
static void Partition(int[] a, ref int n, int pivot,
bool extractSmaller) { ... }
static void Main(string[] args)
{
while (true)
{
Console.Write("n=");
int n = int.Parse(Console.ReadLine());
if (n <= 0)
break;
Console.Write("k=");
int k = int.Parse(Console.ReadLine());
int[] a = GenerateArray(n);
int value = FindKthSmallest(a, k);
PrintArray(a);
Console.WriteLine("{0}. smallest value is {1}", k, value);
}
Console.Write("Press ENTER to continue... ");
Console.ReadLine();
}
}
}
And this is one possible output produced by this demonstration application:
n=10
k=4
37 36 0 27 7 38 65 98 43 90
4. smallest value is 37
n=9
k=3
48 48 4 63 73 43 83 81 92
3. smallest value is 48
n=7
k=6
93 38 13 43 73 90 19
6. smallest value is 93
n=11
k=0
3 17 41 45 52 59 64 60 64 92 82
0. smallest value is 3
n=0
Press ENTER to continue...
FollowUp Exercises
Readers are suggested to try these extended tasks:
 Apply the function to very large arrays containing random data (millions to
hundreds of millions of integers). Measure how time required to produce the
result grows with growing size of the array.
 Try to solve the problem by using Array.Sort and picking the Kth element from the fully sorted array. Repeat the
performance test to measure how sorting time increases with growing size of the
array.
 Compare absolute times obtained in previous tests and discuss the differences.
See also:
Published: Apr 27, 2013; Modified: Feb 13, 2015
ZORAN HORVAT Zoran is software architect dedicated to clean design and team lead in a growing software company. Follow him on Twitter @zoranh75 to receive updates and links to new articles. Watch Zoran's video course Improving Testability Through Design on Pluralsight. Share this article
