Friday, July 10, 2020

Binary Test Analysis

“Binary Test” is a name I gave to a theoretical testing method to identify the infected samples. I got the idea from this tweet

It is a similar core idea as the Binary Search algorithm, hence the name, but instead of looking for one specific sample, we are trying to identify all samples that are infected. The basic idea is, you mix all samples, perform one single test on the batch – if it is negative then none of the samples are infected, if it is positive then you split the batch into two and recurse.
Lets say we have 128 samples, numbered arbitrarily. We mix all 128, and test the batch. It comes out positive, so there must be at least one positive sample. Now we split the samples into two groups. 1-64 and 65-128. We mix the groups separately, and perform the test on each of them. Lets say the test on 1-64 is negative, then there are no infected samples in the group, and test on 65-128 is positive – so there must be an infected sample in this group. Repeat the process until we identify all infected samples. Ideally, this should require less tests than if we were to test each samples separately.
In the experiment described here, we are making a few key assumptions:
  • Test is 100% accurate, no false negatives or false positives.
  • In a mix of any number of samples, as long as at least one sample is infected, the test will return positive
  • Samples can be mixed, and we have enough of each of them
  • We only take samples in quantities of powers of 2

This graph shows the average number of tests it takes to identify all infected samples, with probability of infection from 0 to 1, with intervals of 1/128, and 128 samples. The red curve is the average number of tests using Binary Test, green line is the number of tests using linear testing – so the same as number of samples, in this case 128. As we can infer from the graph, for p < 0.18, Binary Test method requires fewer number of tests than the number of samples – so it is more efficient than linear testing. The transition happens at around p ~ 0.18 which, is true for any number of samples – so for high infection rate, linear testing is preferable.
If we have 128 samples, and p = 1/128, then the average number of tests required is about 13, with standard deviation of about 11. The graph shows approximation to normal distribution for the number of tests required, in practice the number of tests will never be below 1.
The standard deviation also goes down with larger numbers of samples. The graph shows standard deviation as a fraction of the number of samples, with probability of infection 1/128.

Overall, if the above assumptions can be satisfied, Binary Test method can be used to identify infected samples, on average, using fewer test than linear testing given that infection rate is sufficiently small.




Below is the code for the binary_test function, with inputs arr, start, end.
arr – a bool array of the samples. Infected samples have value “True,” healthy samples have value “False.”
start, end – start and end indices for the subset of array being tested
group_test is a helper function that tests every sample in the group, and returns “True” if at least one of them is infected – this is a way to simulate testing a mixed batch.

 def binary_test(arr, start, end):
    results = []
    #Initial test
    if group_test(arr) == False:
        results.append(False)
    else:
    #divide samples into two groups
        if end-start>1 and len(arr)>1:
            mid  = (start+end)//2
            #test each group
            group_1 = group_test(arr[start:mid])
            group_2 = group_test(arr[mid:end])
            #if a group test positive, recurse binary_test on that group
            if group_1:
                results.append(binary_test(arr, start, mid))
            else:
                results.append(False)
            if group_2:
                results.append(binary_test(arr, mid, end))
            else:
                results.append(False)
        else:
            results.append(arr[start:end])
        
    #return bool array
    return results 


The result is a nested array. For example, an array of 16 samples with one infected will return
[False, [False, [False, [[[True]], False]]]]
In the most outer array, the first entry is “False” meaning the first half of the samples are all healthy (1-8). The first entry of the second level array is also “False” meaning the first half of the second group is also all healthy (9-12). The third level array also has “False” as the first entry, so samples (13-14) are also healthy. Finally, in the deepest array the first entry is “True” and the second is “False” therefore sample 15 is infected, and sample 16 is healthy. The full code with all helper functions, including a function that deciphers binary_test result, can be found on Github.