dynamo holdings limited_partnership dynamo gp inc tax_matters_partner petitioner v commissioner of internal revenue respondent beekman vista inc petitioner v commissioner of internal revenue respondent docket nos filed date r requests that ps produce electronically stored information contained on two backup storage tapes or alternatively the tapes themselves or copies thereof ps acknowledge that the tapes contain tax-related information but assert that the tapes also contain privileged information that ps have a right or duty to protect ps assert that they must review the responsive information on the tapes before giving the informa- tion to r to ensure that privileged or confidential information is not disclosed ps request that the court let them use pre- dictive coding a technique prevalent in the technology industry but not yet formally sanctioned by this court to help identify the information that is responsive to r’s request held ps may use predictive coding in responding to r’s request martin r press edward a marod lu-ann mancini dominguez and alan stuart lederman for petitioners david b flassing and lisa goldberg for respondent united_states tax_court reports opinion buch judge these consolidated cases are before the court on respondent’s motion to compel production of docu- ments the cases concern various transfers from beekman vista inc beekman to a related_entity dynamo holdings limited_partnership dynamo respondent determined that the transfers are disguised gifts to dynamo’s owners peti- tioners assert that the transfers are loans respondent requests that petitioners produce the electroni- cally stored information esi contained on two specified backup storage tapes or alternatively that they produce the tapes themselves or copies thereof petitioners assert that it will take many months and cost at least dollar_figure to fulfill respondent’s request because they would need to review each document on the tapes to identify what is responsive and then withhold privileged or confidential information peti- tioners request that the court deny respondent’s motion as a fishing expedition in search of new issues that could be raised in these or other cases alternatively petitioners request that the court let them use predictive coding a tech- nique prevalent in the technology industry but not yet for- mally sanctioned by this court to efficiently and economi- cally identify the nonprivileged information responsive to respondent’s discovery request respondent counters that he wants the backup tapes to review the esi’s metadata and verify the dates on which cer- tain documents were created respondent states that he also wants the backup tapes to ascertain all transfers relevant to this proceeding respondent opposes petitioners’ request to use predictive coding because he states predictive coding is an unproven technology respondent adds that petitioners need not devote their claimed time or expense to this matter because they can simply give him access to all data on the two tapes and preserve the right through a clawback agree- ment to later claim that some or all of the data is privileged information not subject_to discovery respondent also moved to compel interrogatories we will separately address that motion in an order we understand respondent’s use of the term clawback agreement to mean that the disclosure of any privileged information on the tapes would not be a waiver of any privilege that would otherwise apply to dynamo holdings l p v commissioner the court held an evidentiary hearing on respondent’s motion we will grant respondent’s motion to the limited extent stated herein specifically we hold that petitioners must respond to respondent’s discovery request but that they may use predictive coding in doing so i relevant entities background a beekman beekman is a corporation wholly owned by a canadian entity which is controlled by delia moog beekman’s mailing address was in florida when its petition was filed b dynamo dynamo is a limited_partnership owned by a corporation and two trusts that were established for ms moog’s daughter and nephew dynamo’s tax_matters_partner is dynamo gp inc dynamo through its tax_matters_partner alleges that its principal_place_of_business was in delaware when its peti- tion was filed respondent alleges that dynamo’s principal_place_of_business was in florida at that time ii backup tapes dynamo backs up onto tapes its entire exchange server inclusive of emails operating system and configuration information dynamo performs this backup work every four weeks and at the end of every month dynamo generally retains its backup tapes for one year respondent seeks two of the backup tapes specifically the month end date orange and the month end jan orange these tapes contain data backed up from an exchange server and a domain controller and file server ksh-dc the exchange server database has approximately mailboxes ranging in size from mega- bytes to gigabyte each the ksh-dc has a common group and a user group the common group has shares where assigned users may store data to be shared with other assigned users the common group has approximately common top-level file shares and an undetermined number of that information united_states tax_court reports subfolders and ownership of these files may not be limited to the authors of the documents the user group is in a section of the network assigned to a specific individual and has approximately user share folders iii petitioners’ request to use predictive coding personal identification petitioners acknowledge that the two requested backup tapes contain tax-related information but assert that the tapes also contain information health insurance information hipaa protected information and other confidential information that petitioners have a duty to protect petitioners assert that if they must respond to respondent’s discovery request they must review the documents on the backup tapes to ensure that no privi- leged or confidential information is disclosed before giving any information to respondent petitioners ask the court to let them use predictive coding to efficiently and economically help identify the nonprivileged information that is responsive to respondent’s discovery request more specifically peti- tioners want to implement the following procedure to respond to the request restore some or all of the data from the tapes qualify the restored data ie remove nist files system files etc index and load the qualified restored data into a review environ- ment apply criteria to the loaded data to remove duplicate messages and other nonrelevant information through the implementation of predictive coding review the remaining data using search criteria that the parties agree upon to ascertain on the one hand information that is relevant to the matter and on the other hand potentially relevant information that should be withheld as privileged or confidential information the health insurance portability and accountability act of hipaa pub_l_no secs stat pincite3 con- tains privacy rules and gave rise to privacy regulations relating to individ- ually identifiable health information the national institute of standards and technology nist which is an agency of the u s department of commerce maintains a database of hash values of files that typically are part of an operating system or a piece of software a hash value which is essentially a fingerprint of a file is a numeric computation of a file’s content which is used to identify the file two files with the same hash values are exact copies of each other dynamo holdings l p v commissioner produce the relevant nonprivileged information and a privilege log that sets forth the claimed privileged documents and sufficient informa- tion supporting that claim i discovery in general discussion a party in this court generally may obtain discovery of documents and esi to the extent that the information con- tained therein is not privileged and is relevant to the subject matter of the case see rule a b see also rule a in this context documents and esi include writings drawings graphs charts photographs sound_recordings images and other data compilations stored in any medium from which information can be obtained either directly or translated if necessary by the responding party into a reasonably usable form rule a and a party is gen- rule references are to the tax_court rules_of_practice and procedure rule a provides rule production of documents electronically stored information and things a scope any party may without leave of court serve on any other party a request to produce and permit the party making the request or someone acting on such party’s behalf to inspect and copy test or sample any designated documents or electronically stored information including writings draw- ings graphs charts photographs sound_recordings images and other data compilations stored in any medium from which information can be ob- tained either directly or translated if necessary by the responding party into a reasonably usable form or to inspect and copy test or sample any tangible thing to the extent that any of the foregoing items are in the pos- session custody or control of the party on whom the request is served literature on electronic data storage has characterized electronically stored data as falling within five categories see 217_frd_309 s d n y these categories are active on- line data eg hard drives near-line data eg optical disks offline stor- age archives ie removable optical disk or magnetic tape media backup tapes ie a device that reads data from and writes it onto a tape and fragmented erased or damaged data fragmented data consists of files that are broken up and placed randomly throughout the disk see id pincite the first three categories are generally considered accessible while the remaining categories are generally considered inaccessible see continued united_states tax_court reports erally required to produce documents or electronically stored information in the form in which they are maintained rule b a party however is not required to provide dis- covery of esi from sources that the party establishes are not reasonably accessible because of undue burden or cost unless the court concludes that the requesting party has shown good cause for the discovery see rule c these rules are all similar to corresponding provisions found in the fed- eral rules of civil procedure see fed r civ p a a b e b b ii respondent’s request respondent requests access to petitioners’ esi petitioners resist this request primarily because of cost and of concern that privileged or confidential information will be improperly disclosed respondent essentially responds that he can alleviate both concerns if petitioners give him all of the requested information with the condition that he will allow them to later claim that some or all of that information should not be disclosed further because it is privileged peti- tioners remain mindful of their need to protect their privi- leged or confidential information as well as the projected cost of protecting that information and ask the court to allow them to use predictive coding in responding to respond- ent’s request in this respect we note that this request is somewhat unusual our rules are clear that the court expects the par- ties to attempt to attain the objectives of discovery through informal consultation or communication before resorting to formal discovery procedures rule a and although it is a proper role of the court to supervise the discovery process and intervene when it is abused by the parties the court is not normally in the business of dictating to parties the process that they should use when responding to discovery if our focus were on paper discovery we would not for example be dictating to a party the manner in which it should review documents for responsiveness or privilege such as whether that review should be done by a paralegal id pincite petitioners do not claim that if they use predictive coding the re- quested esi is not reasonably accessible because of undue burden or cost dynamo holdings l p v commissioner a junior attorney or a senior attorney yet that is in essence what the parties are asking the court to consider- whether document review should be done by humans or with the assistance of computers respondent fears an incomplete response to his discovery request if respondent believes that the ultimate discovery response is incomplete and can sup- port that belief he can file another motion to compel at that time nonetheless because we have not previously addressed the issue of computer-assisted review tools we will address it here iii expert witnesses each party called a witness to testify at the evidentiary hearing as an expert petitioners’ witness was james r scarazzo respondent’s witness was michael l wudke the court recognized the witnesses as experts on the subject matter at hand we may accept or reject the findings and conclusions of the experts according to our own judgment see 140_tc_294 we also may be selective in deciding what parts if any of their opinions to accept see id iv analysis the court applies the standard of relevancy liberally when it comes to matters of discovery see eg 73_tc_469 and a party challenging the requested production of a document including esi has the burden of establishing that the document is not discoverable see 81_tc_937 64_tc_191 we believe that respondent’s request for the esi is within the bounds of our rules and petitioners do not appear to contest this point at the same time however we are faced with the competing interests of the parties on one hand we do not consider it appropriate to order petitioners to give all of their esi to respondent subject_to a right to later claim that some or all of the information that he has reviewed is privileged or confidential information and thus outside the bounds of discovery although the use of a clawback agree- united_states tax_court reports ment may be an option to which the parties might consent petitioners reasonably resist entering into any such agree- ment as part of a plan under which they would voluntarily allow respondent to see all of the privileged or confidential information on the requested tapes on the other hand given the time and expense involved with petitioners’ review of all the esi to identify any privileged or confidential information we likewise do not consider it appropriate to order peti- tioners to go to that extreme either we find a potential happy medium in petitioners’ proposed use of predictive coding predictive coding is an expedited and efficient form of computer-assisted review that allows parties in litigation to avoid the time and costs associated with the traditional manual review of large volumes of docu- ments through the coding of a relatively small sample of documents computers can predict the relevance of docu- ments to a discovery request and then identify which docu- ments are and are not responsive the parties typically through their counsel or experts select a sample of docu- ments from the universe of those documents to be searched by using search criteria that may for example consist of keywords dates custodians and document types and the selected documents become the primary data used to cause the predictive coding software to recognize patterns of rel- evance in the universe of documents under review the soft- ware distinguishes what is relevant and each iteration pro- duces a smaller relevant subset and a larger set of irrelevant documents that can be used to verify the integrity of the results through the use of predictive coding a party responding to discovery is left with a smaller set of docu- ments to review for privileged information resulting in savings both in time and in expense the party responding to the discovery request also is able to give the other party a log detailing the records that were withheld and the rea- sons they were withheld magistrate judge andrew peck published a leading oft- cited article on predictive coding which is helpful to our understanding of that method see andrew peck search forward will manual document review and keyboard searches be replaced by computer-assisted coding l tech news date the article generally discusses the mechanics of predictive coding and the shortcomings of dynamo holdings l p v commissioner manual review and of keyword searches the article explains that predictive coding is a form of computed-assisted coding which in turn means tools that use sophisti- cated algorithms to enable the computer to determine rel- evance based on interaction with ie training by a human reviewer id pincite the article explains that u nlike manual review where the review is done by the most junior staff computer-assisted coding involves a senior partner or team who review and code a seed set of documents the computer identifies prop- erties of those documents that it uses to code other documents as the senior reviewer continues to code more sample documents the computer predicts the reviewer’s coding or the computer codes some documents and asks the senior reviewer for feedback when the system’s predictions and the reviewer’s coding sufficiently coincide the system has learned enough to make confident predictions for the remaining documents typically the senior lawyer or team needs to review only a few thousand documents to train the computer some systems produce a simple yes no as to relevance while others give a relevance score say on a to basis that counsel can use to prioritize review for example a score above may produce of the relevant documents but constitutes only of the entire document set counsel may decide after sampling and quality control tests that documents with a score of below are so highly likely to be irrelevant that no further human review is necessary counsel can also decide the cost-benefit of manual review of the documents with scores of id the substance of the article was eventually adopted in an opinion that states this judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant esi in appropriate cases 287_frd_182 s d n y adopted sub nom moore v publicis groupe sa no civ alc ajp wl s d n y date respondent asserts that predictive coding should not be used in these cases because it is an unproven technology we disagree although predictive coding is a relatively new technique and a technique that has yet to be sanctioned let alone mentioned by this court in a published opinion the understanding of e-discovery and electronic media has advanced significantly in the last few years thus making predictive coding more acceptable in the technology industry we use the term e-discovery to refer to electronic discovery which in turn means the obtaining of esi in the discovery phase of litigation united_states tax_court reports than it may have previously been in fact we understand that the technology industry now considers predictive coding to be widely accepted for limiting e-discovery to relevant documents and effecting discovery of esi without an undue burden see progressive cas ins co v delaney no cv-00678-lrh-pal wl at d nev date stating with citations of articles that predictive coding has proved to be an accurate way to comply with a discovery request for esi and that studies show it is more accurate than human review or keyword searches f d i c v bowden no cv413-245 wl at s d ga date directing that the parties consider the use of predictive coding see generally nicholas barry man versus machine review the showdown between hordes of discovery lawyers and a computer-utilizing predictive- coding technology vand j ent tech l lisa c wood predictive coding has arrived aba anti- trust j the use of predictive coding also is not unprecedented in federal litigation see eg hinterberger v catholic health sys inc no 08-cv-3805 f wl w d n y date in re actos no 11-md- wl w d la date 287_frd_182 where as here petitioners reasonably request to use predictive coding to conserve time and expense and rep- resent to the court that they will retain electronic discovery experts to meet with respondent’s counsel or his experts to conduct a search acceptable to respondent we see no reason petitioners should not be allowed to use predictive coding to respond to respondent’s discovery request cf progressive cas ins co wl at declining to allow the use of predictive coding where the record lacked the necessary transparency and cooperation among counsel in the review and production of esi responsive to the dis- covery request mr scarazzo’s expert testimony supports our opinion he testified that discovery of esi essentially involves a two-step predictive coding is so commonplace in the home and at work in that most if not all individuals with an email program use predictive coding to filter out spam email see 287_frd_182 n s d n y adopted sub nom moore v publicis groupe sa no civ alc ajp wl s d n y date mr wudke did not persuasively say anything to erode or otherwise dynamo holdings l p v commissioner process first the universe of data is narrowed to data that is potentially responsive to a discovery request second the potentially responsive data is narrowed down to what is in fact responsive he also testified that he was familiar with both predictive coding and keyword searching two of the techniques commonly employed in the first step of the two- step discovery process and he compared those techniques by stating k ey word searching is as the name implies is a list of terms or terminologies that are used that are run against documents in a method of determining or identifying those documents to be reviewed what pre- dictive coding does is it takes the type of documents the layout maybe the whispets of the documents the format of the documents and it uses a computer model to predict which documents out of the whole set might contain relevant information to be reviewed so one of the things that it does is by using technology it eliminates or minimizes some of the human error that might be associated with it sometimes there’s inefficiencies with key word searching in that it may include or exclude documents whereas training the model to go back and predict this we can look at it and use statistics and other sampling information to pull back the information and feel more confident that the information that’s being reviewed is the universe of potentially respon- sive data he concluded that the trend was in favor of predictive coding because it eliminates human error and expedites review in addition mr scarazzo opined credibly and without con- tradiction that petitioners’ approach to responding to respondent’s discovery request is the most reasonable way for petitioners to comply with that request petitioners asked mr scarazzo to analyze and to compare the parties’ dueling approaches in the setting of the data to be restored from dynamo’s backup tapes and to opine on which of the approaches is the most reasonable way for petitioners to comply with respondent’s request mr scarazzo assumed as to petitioners’ approach that the restored data would be searched using specific criteria that the resulting informa- tion would be reviewed for privilege and that petitioners would produce the nonprivileged information to respondent he assumed as to respondent’s approach that the restored data would be searched for privileged information without using specific search criteria that the resulting privileged undercut mr scarazzo’s testimony united_states tax_court reports information would be removed and that petitioners would then produce the remaining data to respondent as to both approaches he examined certain details of dynamo’s backup tapes interviewed the person most knowledgeable on dynamo’s backup process and the contents of its backup tapes dynamo’s director of information_technology and per- formed certain cost calculations mr scarazzo concluded that petitioners’ approach would reduce the universe of information on the tapes using criteria set by the parties to minimize review time and expense and ultimately result in a focused set of information germane to the matter he estimated that big_number to big_number documents would be subject_to review under petitioners’ approach at a cost of dollar_figure to dollar_figure while million to million documents would be subject_to review under respondent’s approach at a cost of dollar_figure to dollar_figure our rules including our discovery rules are to be con- strued to secure the just speedy and inexpensive determina- tion of every case rule d petitioners may use predictive coding in responding to respondent’s discovery request if after reviewing the results respondent believes that the response to the discovery request is incomplete he may file a motion to compel at that time see rule b d accordingly an appropriate order will be issued f
