Leetcode 4. Median of Two Sorted Arrays

问题描述

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

You may assume nums1 and nums2 cannot be both empty.

Example 1:

nums1 = [1, 3]
nums2 = [2]

The median is 2.0

Example 2:

nums1 = [1, 2]
nums2 = [3, 4]

The median is (2 + 3)/2 = 2.5

解题思路-暴力

连接两个数组,排序,取中位数。(其实还是挺快的)

牛逼思路

首先从数学上理解,什么是中位数。从统计学上讲,中位数用来 ”将一个集合分为等长的两个子集,使得其中一个子集中的元素总是比另一个中的大“。

首先,将集合 A 以随机值 i 为中心划分为两个部分:

        left_A          |          right_A
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]

集合 A 总共包含 m 个元素,所以有 m+1 种划分方法(i = 0~m).而且 len(left_A) = i,len(right_A) = m-i. 注意: 当 i = 0 时,left_A 为空。当 i = m 时,right_A 为空。

使用同样的方式,可以用一个随机的 j 将 B 划分为两部分。

        left_B          |          right_B
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]

将 left_A 和 left_B 放到同一个集合中,right_A、right_B 放到另一个集合中。

       left_part        |        right_part
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]

如果我们能保证

1 len(left_part) == len(right_part)
2 max(left_part) <= min(right_part)

我们就已经将 A、B 划分为相等大小的两个集合,其中一个集合中的元素一定比另一个大。所以此时 median = (max(left_part) + min(right_part))/2.

为了保证以上两个条件。我们只需要保证

(1) i + j== m- i + n - j (or: m - i + n - j + 1)
    如果 n >= m, 我们需要设置 i = 0 ~ m, j = (m + n + 1)/2 -i
(2) B[j-1] <= A[i] 和 A[i-1] <= B[j]

所以我们只需要:

在 [0, m] 的范围内遍历 i,找到一个 i 满足以下条件:
    B[j-1] <= A[i] and A[i-1] <= B[j] (j = (m + n + 1)/2 - i)

我们可以使用二分查找的方法。算法步骤如下:

<1> Set imin = 0, imax = m, then start searching in [imin, imax]

<2> Set i = (imin + imax)/2, j = (m + n + 1)/2 - i

<3> Now we have len(left_part)==len(right_part). And there are only 3 situations
    that we may encounter:
    <a> B[j-1] <= A[i] and A[i-1] <= B[j]
        Means we have found the object `i`, so stop searching.
    <b> B[j-1] > A[i]
        Means A[i] is too small. We must `ajust` i to get `B[j-1] <= A[i]`.
        Can we `increase` i?
            Yes. Because when i is increased, j will be decreased.
            So B[j-1] is decreased and A[i] is increased, and `B[j-1] <= A[i]` may
            be satisfied.
        Can we `decrease` i?
            `No!` Because when i is decreased, j will be increased.
            So B[j-1] is increased and A[i] is decreased, and B[j-1] <= A[i] will
            be never satisfied.
        So we must `increase` i. That is, we must ajust the searching range to
        [i+1, imax]. So, set imin = i+1, and goto <2>.
    <c> A[i-1] > B[j]
        Means A[i-1] is too big. And we must `decrease` i to get `A[i-1]<=B[j]`.
        That is, we must ajust the searching range to [imin, i-1].
        So, set imax = i-1, and goto <2>.

当我们找到了满足条件的 i 时,中位数为

max(A[i-1], B[j-1]) (when m + n is odd)
or (max(A[i-1], B[j-1]) + min(A[i], B[j]))/2 (when m + n is even)

现在来考虑边界值 i=0, i=m, j=0, j=m,在这些取值中 A[i-1] B[j-1] A[i] B[j] 可能不存在。

Searching i in [0, m], to find an object `i` that:
(j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j == n or A[i-1] <= B[j])
where j = (m + n + 1)/2 - i

在一个查找循环中,我们只会遇到三种情形。

<a> (j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j = n or A[i-1] <= B[j])
Means i is perfect, we can stop searching.

<b> j > 0 and i < m and B[j - 1] > A[i]
    Means i is too small, we must increase it.

<c> i > 0 and j < n and A[i - 1] > B[j]
    Means i is too big, we must decrease it.

因为 i < m ==> j > 0 and i > 0 ==> j < n

m <= n, i < m ==> j = (m+n+1)/2 - i > (m+n+1)/2 - m >= (2*m+1)/2 - m >= 0
m <= n, i > 0 ==> j = (m+n+1)/2 - i < (m+n+1)/2 <= (2*n+1)/2 <= n

所以在情形 b c 中,我们不需要检查 j 大于 0 或小于 n.

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def median(A, B):
m, n = len(A), len(B)
if m > n:
A, B, m, n = B, A, n, m
if n == 0:
raise ValueError

imin, imax, half_len = 0, m, (m + n + 1) / 2
while imin <= imax:
i = (imin + imax) / 2
j = half_len - i
if i < m and B[j-1] > A[i]:
# i is too small, must increase it
imin = i + 1
elif i > 0 and A[i-1] > B[j]:
# i is too big, must decrease it
imax = i - 1
else:
# i is perfect

if i == 0: max_of_left = B[j-1]
elif j == 0: max_of_left = A[i-1]
else: max_of_left = max(A[i-1], B[j-1])

if (m + n) % 2 == 1:
return max_of_left

if i == m: min_of_right = B[j]
elif j == n: min_of_right = A[i]
else: min_of_right = min(A[i], B[j])

return (max_of_left + min_of_right) / 2.0