“Binary Test” is a name I gave to a theoretical testing method to identify the infected samples. I got the idea from this tweet
If you have a limited number of tests but unlimited ability to take individual samples what would be the ideal way to use them? If you expect a single positive result per 128 tests should you mix samples and binary search? Is there a better algorithm for other distributions?— Sam Pullara (@sampullara) March 13, 2020
It is a similar core idea as the Binary Search algorithm, hence the
name, but instead of looking for one specific sample, we are trying to
identify all samples that are infected. The basic idea is, you mix all
samples, perform one single test on the batch – if it is negative then
none of the samples are infected, if it is positive then you split the
batch into two and recurse.
Lets say we have 128 samples, numbered arbitrarily. We mix all 128, and test the batch. It comes out positive, so there must be at least one positive sample. Now we split the samples into two groups. 1-64 and 65-128. We mix the groups separately, and perform the test on each of them. Lets say the test on 1-64 is negative, then there are no infected samples in the group, and test on 65-128 is positive – so there must be an infected sample in this group. Repeat the process until we identify all infected samples. Ideally, this should require less tests than if we were to test each samples separately.
In the experiment described here, we are making a few key assumptions:
Lets say we have 128 samples, numbered arbitrarily. We mix all 128, and test the batch. It comes out positive, so there must be at least one positive sample. Now we split the samples into two groups. 1-64 and 65-128. We mix the groups separately, and perform the test on each of them. Lets say the test on 1-64 is negative, then there are no infected samples in the group, and test on 65-128 is positive – so there must be an infected sample in this group. Repeat the process until we identify all infected samples. Ideally, this should require less tests than if we were to test each samples separately.
In the experiment described here, we are making a few key assumptions:
- Test is 100% accurate, no false negatives or false positives.
- In a mix of any number of samples, as long as at least one sample is infected, the test will return positive
- Samples can be mixed, and we have enough of each of them
- We only take samples in quantities of powers of 2
This graph shows the average number of tests it takes to identify all
infected samples, with probability of infection from 0 to 1, with
intervals of 1/128, and 128 samples. The red curve is the average number
of tests using Binary Test, green line is the number of tests using
linear testing – so the same as number of samples, in this case 128. As we can infer from the graph, for p < 0.18, Binary Test method
requires fewer number of tests than the number of samples – so it is
more efficient than linear testing. The transition happens at around p ~
0.18 which, is true for any number of
samples – so for high infection rate, linear testing is preferable.
If we have 128 samples, and p = 1/128, then the average number of tests
required is about 13, with standard deviation of about 11. The graph
shows approximation to normal distribution for the number of tests
required, in practice the number of tests will never be below 1.
The standard deviation also goes down with larger numbers of samples.
The graph shows standard deviation as a fraction of the number of
samples, with probability of infection 1/128.
Overall, if the above assumptions can be satisfied, Binary Test
method can be used to identify infected samples, on average, using fewer
test than linear testing given that infection rate is sufficiently
small.
Below is the code for the binary_test function, with inputs arr, start, end.
arr – a bool array of the samples. Infected samples have value “True,” healthy samples have value “False.”
start, end – start and end indices for the subset of array being tested
group_test is a helper function that tests every sample in the group, and returns “True” if at least one of them is infected – this is a way to simulate testing a mixed batch.
Below is the code for the binary_test function, with inputs arr, start, end.
arr – a bool array of the samples. Infected samples have value “True,” healthy samples have value “False.”
start, end – start and end indices for the subset of array being tested
group_test is a helper function that tests every sample in the group, and returns “True” if at least one of them is infected – this is a way to simulate testing a mixed batch.
def binary_test(arr, start, end):
results = []
#Initial test
if group_test(arr) == False:
results.append(False)
else:
#divide samples into two groups
if end-start>1 and len(arr)>1:
mid = (start+end)//2
#test each group
group_1 = group_test(arr[start:mid])
group_2 = group_test(arr[mid:end])
#if a group test positive, recurse binary_test on that group
if group_1:
results.append(binary_test(arr, start, mid))
else:
results.append(False)
if group_2:
results.append(binary_test(arr, mid, end))
else:
results.append(False)
else:
results.append(arr[start:end])
#return bool array
return results
The result is a nested array. For example, an array of 16 samples with one infected will return[False, [False, [False, [[[True]], False]]]]
In the most outer array, the first entry is “False” meaning the first half of the samples are all healthy (1-8). The first entry of the second level array is also “False” meaning the first half of the second group is also all healthy (9-12). The third level array also has “False” as the first entry, so samples (13-14) are also healthy. Finally, in the deepest array the first entry is “True” and the second is “False” therefore sample 15 is infected, and sample 16 is healthy.
The full code with all helper functions, including a function that deciphers binary_test result, can be found on Github.