问题描述
There are two sorted arrays nums1 and nums2 of size m and n respectively.
Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
You may assume nums1 and nums2 cannot be both empty.
Example 1:
nums1 = [1, 3]
nums2 = [2]
The median is 2.0
Example 2:
nums1 = [1, 2]
nums2 = [3, 4]
The median is (2 + 3)/2 = 2.5
解题思路-暴力
连接两个数组,排序,取中位数。(其实还是挺快的)
牛逼思路
首先从数学上理解,什么是中位数。从统计学上讲,中位数用来 ”将一个集合分为等长的两个子集,使得其中一个子集中的元素总是比另一个中的大“。
首先,将集合 A 以随机值 i 为中心划分为两个部分:
left_A | right_A
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]
集合 A 总共包含 m 个元素,所以有 m+1 种划分方法(i = 0~m).而且 len(left_A) = i,len(right_A) = m-i. 注意: 当 i = 0 时,left_A 为空。当 i = m 时,right_A 为空。
使用同样的方式,可以用一个随机的 j 将 B 划分为两部分。
left_B | right_B
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]
将 left_A 和 left_B 放到同一个集合中,right_A、right_B 放到另一个集合中。
left_part | right_part
A[0], A[1], ..., A[i-1] | A[i], A[i+1], ..., A[m-1]
B[0], B[1], ..., B[j-1] | B[j], B[j+1], ..., B[n-1]
如果我们能保证
1 len(left_part) == len(right_part)
2 max(left_part) <= min(right_part)
我们就已经将 A、B 划分为相等大小的两个集合,其中一个集合中的元素一定比另一个大。所以此时 median = (max(left_part) + min(right_part))/2.
为了保证以上两个条件。我们只需要保证
(1) i + j== m- i + n - j (or: m - i + n - j + 1)
如果 n >= m, 我们需要设置 i = 0 ~ m, j = (m + n + 1)/2 -i
(2) B[j-1] <= A[i] 和 A[i-1] <= B[j]
所以我们只需要:
在 [0, m] 的范围内遍历 i,找到一个 i 满足以下条件:
B[j-1] <= A[i] and A[i-1] <= B[j] (j = (m + n + 1)/2 - i)
我们可以使用二分查找的方法。算法步骤如下:
<1> Set imin = 0, imax = m, then start searching in [imin, imax]
<2> Set i = (imin + imax)/2, j = (m + n + 1)/2 - i
<3> Now we have len(left_part)==len(right_part). And there are only 3 situations
that we may encounter:
<a> B[j-1] <= A[i] and A[i-1] <= B[j]
Means we have found the object `i`, so stop searching.
<b> B[j-1] > A[i]
Means A[i] is too small. We must `ajust` i to get `B[j-1] <= A[i]`.
Can we `increase` i?
Yes. Because when i is increased, j will be decreased.
So B[j-1] is decreased and A[i] is increased, and `B[j-1] <= A[i]` may
be satisfied.
Can we `decrease` i?
`No!` Because when i is decreased, j will be increased.
So B[j-1] is increased and A[i] is decreased, and B[j-1] <= A[i] will
be never satisfied.
So we must `increase` i. That is, we must ajust the searching range to
[i+1, imax]. So, set imin = i+1, and goto <2>.
<c> A[i-1] > B[j]
Means A[i-1] is too big. And we must `decrease` i to get `A[i-1]<=B[j]`.
That is, we must ajust the searching range to [imin, i-1].
So, set imax = i-1, and goto <2>.
当我们找到了满足条件的 i 时,中位数为
max(A[i-1], B[j-1]) (when m + n is odd)
or (max(A[i-1], B[j-1]) + min(A[i], B[j]))/2 (when m + n is even)
现在来考虑边界值 i=0, i=m, j=0, j=m,在这些取值中 A[i-1] B[j-1] A[i] B[j] 可能不存在。
Searching i in [0, m], to find an object `i` that:
(j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j == n or A[i-1] <= B[j])
where j = (m + n + 1)/2 - i
在一个查找循环中,我们只会遇到三种情形。
<a> (j == 0 or i == m or B[j-1] <= A[i]) and
(i == 0 or j = n or A[i-1] <= B[j])
Means i is perfect, we can stop searching.
<b> j > 0 and i < m and B[j - 1] > A[i]
Means i is too small, we must increase it.
<c> i > 0 and j < n and A[i - 1] > B[j]
Means i is too big, we must decrease it.
因为 i < m ==> j > 0 and i > 0 ==> j < n
m <= n, i < m ==> j = (m+n+1)/2 - i > (m+n+1)/2 - m >= (2*m+1)/2 - m >= 0
m <= n, i > 0 ==> j = (m+n+1)/2 - i < (m+n+1)/2 <= (2*n+1)/2 <= n
所以在情形 b c 中,我们不需要检查 j 大于 0 或小于 n.
代码如下:
1 | def median(A, B): |