The IDS Challenge
When is an application related or unrelated?
Under Dayco Products, co-pending applications in which "similar subject matter but patentably indistinct claims" are present must be disclosed to the Examiner(s) in each of the involved applications. See Dayco Prod., In. v. Total Containment, Inc., 329 F.3d 1358 (Fed. Cir. 2003). In defining “similar subject matter,” the Dayco Products court held that applications in separate families having independent claims containing the same exact terms triggered the duty to disclose. However, the fact pattern can get complicated as in McKesson, where the court held that although each of the independent claims in McKesson had a different scope such that one application claimed a system and another application claimed a method and individual parts of the system, both applications still contained “similar subject matter.” McKesson Info. Solutions, Inc. v. Bridge Med., Inc., 487 F.3d 897, 913 (Fed. Cir. 2007). Also, elements added in dependent claims can render the claims in the second case similar to the system claims of the first case. See id. at 911, 918. Muddying the waters still further, identical language in claims does not necessarily mean that the claims are drawn to the same invention. Every claim must be construed in light of the application in which it appears. Claim amendments made during prosecution in one previously unrelated application can render a “patentably indistinct” claim into a “patentably distinct” claim, or vice versa. In other words, each set of pending claims must be compared to each one of the possibly related applications.
The process of separating related from unrelated applications is labor intensive and requires examination of the applications on a claim-by–claim basis. If the Applicant has a large patent estate, grouping the applications can balloon into a lengthy and time-consuming process which requires updates each time claim amendments are filed in any one of the applications.
How we tackled the problem
We applied text processing and a naive Bayes classifier to published patent claims. We first obtained the claims and inventor names of published applications for a company having at least 500 filed applications in a single art unit. For a training set, we indicated which applications (in our opinion) should be grouped together as related or unrelated. Continuations, divisionals, CIPs, and shared inventorship were considered by a registered attorney in making the initial determination. We then pre-processed the claims text and performed a vector transformation, and applied the naive Bayes predictor to classify the documents into the 2 classes. The classified applications were then reviewed by the attorney. In most cases, the attorney agreed with the classification. Our accuracy statistics showed a true negative rate of 6 and false negative rate of 9.
Given the high true negative and false negative rate, we then classified the data into three categories of related, unrelated, and undefined. We ignored for now that we used the same dataset to train the model, and acknowledge optimistic over-fitting. We did not partition the data.
With the additional classifier, the predictor did less well, as expected, particularly in view of the small sample size. However, the difference between related and unrelated stayed the same. The model is specific to the pre-processed text. As such, the model may perform poorly on other claim sets.
Although the method requires refinement, a possible approach was identified to quickly triage applications into related or unrelated groups. Further research and modeling including bootstrapping and cross validation to better assess the data is required to confirm the results. The differences in the source data will certainly impact future classifications.