Developing and Testing the Agency for Healthcare Research and Quality’s National Guideline Clearinghouse Extent of Adherence to Trustworthy Standards (NEATS) Instrument

0
496

Clinical practice guidelines (CPGs) represent an important means for medical societies, health care institutions, governments, health departments, and other entities in health care to provide guidance on issues of clinical care and policy. Yet, CPGs vary in quality, which poses an important issue for health care overall because users may be unable to discern the basis for recommendations and whether a guideline is trustworthy. At the request of the U.S. Congress, the Agency for Healthcare Research and Quality (AHRQ) contracted with the Institute of Medicine (IOM) (now the National Academy of Medicine) to create standards for developing CPGs. In March 2011, the IOM published the report, Clinical Practice Guidelines We Can Trust (1), which required that a CPG’s recommendations be informed by a systematic review of the evidence. The report laid out 8 standards of trustworthiness relating to evaluating evidence foundations, ensuring transparency and rigor in the guideline development process, incorporating patient and public perspectives, mitigating bias and conflicts of interest, and remaining up to date. The IOM also recommended that AHRQ have the National Guideline Clearinghouse (NGC) indicate the extent to which CPGs adhere to the standards set forth in the report. In 2013, NGC revised its inclusion criteria to incorporate the updated IOM definition of a CPG, which resulted in acceptance of fewer guidelines (2). In October 2017, NGC began displaying the extent to which guidelines in its database adhered to the IOM standards for trustworthiness, using the NGC Extent of Adherence to Trustworthy Standards (NEATS) instrument. This article summarizes the development of the instrument, which built on the framework of the AGREE II (Appraisal of Guidelines for Research and Evaluation II) tool (37). Methods The AHRQ’s NEATS instrument was developed by ECRI Institute with input from AHRQ and the NGC editorial board (Supplement). The internal work group comprised Kathleen Lohr, PhD; Paul Shekelle, MD, PhD; Richard Shiffman, MD, MCIS; and Craig Robbins, MD, MPH; it provided additional input and feedback on anchoring the scales for individual items and evaluating the NEATS instrument. Supplement. Supplementary Material Selecting IOM Standards To determine which IOM standards to include in the NEATS instrument, the expert panel members (Supplement), NGC editorial board, and NGC staff ranked the IOM standards according to importance and priority for implementation based on their experience and expertise. After discussing averaged individual rankings, we weighted the standards according to the difficulty of implementation by developers and the likelihood that adherence would be documented in the CPG or supplemental documents. The NEATS instrument included the top scoring standards, which incorporated aspects of all 8 major standards. Some of the IOM substandards within the 8 major standards represent important ideals that were difficult to include as assessment criteria as given (for example, Members of the GDG [guideline development group] should divest themselves of financial investments they or their family members have in, and not participate in marketing activities or advisory boards of, entities whose interests could be affected by CPG recommendations [1]). The IOM substandards selected through this process were deemed collectively to be of the highest priority and most practical for initial implementation. To translate these concepts into a cohesive instrument, we needed to adapt them for practical implementation.

We then began evaluating existing instruments for guideline appraisal to ascertain whether they could be adapted to our needs. Evaluating Existing Instruments To identify relevant existing tools for CPG appraisal, a medical librarian at ECRI searched the literature using a protocol that included both bibliographic and gray literature resources. These include PubMed, Medline, and Embase, as well as Google Scholar and Scopus for recursive citation searching. Searches of bibliographic databases incorporated controlled vocabulary terms from MeSH and Emtree, as well as keywords. We ultimately identified several tools for guideline appraisal, including those by Cluzeau and colleagues (8), Hayward and colleagues (9), Shaneyfelt and colleagues (10), Shiffman and colleagues (11), Mitchell and colleagues (12), and the AGREE Next Steps Consortium (37). Next, we compared these tools with the selected IOM standards of interest. No instrument fully suited the specific needs of NGCnamely, a concise tool focused on the IOM standardsbut AGREE II contained elements that we could adapt to meet our needs. We also used scientific citation mapping to evaluate various instruments and identified the AGREE or AGREE II tool as the predominant tool for guideline appraisal in the literature. At 2 different time points, we discussed with members of the AGREE Trust our use and modification of the AGREE II tool as part of our efforts to develop an instrument that would be suitable for NGC and focus specifically on the IOM standards. Generating Items We assessed the AGREE II tool, highlighting gaps and concept mismatches between its items and the selected IOM standards. Staff from NGC selected relevant items from AGREE II and modified them to reflect the standards. Some IOM standards had no appropriate corresponding items in AGREE II (for example, recommendations should have a rating of their strength in light of the level of confidence in the evidence), so we created new items for NEATS (such as item 9, rating or grading the strength of recommendations). Of the 15 NEATS items, 4 (items 3b, 6, 9, and 11) were new additions or considerable alterations of items in the AGREE II tool, whereas 6 others (items 2, 4, 5a, 5b, 5c, and 10) were modifications of AGREE II items. Each NEATS item is based on a specific IOM principle, but some take either a broader or a simpler approach. Because NGC contains diverse guidelines, we tailored the NEATS rating criteria as necessary so that the NGC team could later implement the tool. Over multiple rounds (by conference calls and written communication), the internal work group provided feedback on refining concept consistency, item validity, clarity, relevancy, and ease of use. The larger NGC editorial board and AHRQ also discussed the tool. Using a modified Delphi process, 13 thought leaders from the NGC editorial board provided 11 rounds of input and feedback on the development of the NEATS instrument, including specific input on individual item generation and scaling. The Delphi process was modified so that prioritization was not entirely driven by consensus, to take into account the priorities of AHRQ and practical constraints. Scaling and Anchoring Items We evaluated each generated item for relevant and appropriate scales. For most items, we used a 5-point Likert scale from 1 (lowest) to 5 (highest) to identify degrees of adherence to the rating criteria related to the IOM standard. For 3 items about transparency and GDG composition, we used a scale of yes, no, or unknown. For items that used a Likert scale, we identified examples of highest (5 points), middle (3 points), and lowest (1 points) adherence as scale anchors for each item. Evaluating and Field-Testing the NEATS Instrument We assessed the NEATS instrument on the following 3 levels: evaluation of interrater reliability, evaluation of external validity, and field-testing. For evaluation of interrater reliability, 3 trained NGC staff reviewers assessed a purposive sample of 21 guidelines, with dual review of each. The sample was selected to include guidelines from various developer organizations (large and small, in various specialties [primary care and specialty care; surgical and nonsurgical], and domestic and international) and on various topics (pediatric and adult care; inpatient and outpatient care; and screening, diagnostic, and therapeutic topics). We also tested the basic usability of the instrument and solicited specific feedback on its functionality from the reviewers. We evaluated the external validity of the instrument with 10 external stakeholders who were experts in the field of guideline development (9 persons who were not federal employees and 1 federal employee). These guideline developers, guideline researchers, systematic reviewers, and clinicians assessed each NEATS item in the instrument, as well as the overall instrument. Using a questionnaire developed by NGC, we asked all stakeholders whether they believed that each item on the instrument was a good measure of the corresponding IOM concept; that each item should be considered for inclusion, modification, or deletion; that the instrument overall was suitable for its intended purpose; and that the instrument overall provided useful information about the trustworthiness of a guideline. The 10 external stakeholders also field-tested the NEATS instrument. We provided the instrument, user guide, detailed instructions, and a guideline to review. We asked them to evaluate the guideline using the instrument and to provide feedback on the instrument. We made subsequent refinements to the NEATS instrument based on all 3 levels of evaluation and additional feedback from the NGC editorial board. To create a format for displaying the assessments on the NGC Web site, NGC staff worked with AHRQ, the editorial board, and NGC’s information technology subcontractor to develop qualitative descriptors and a graphic display. NEATS Instrument Training and Retesting Internal reviewers were trained on the NEATS instrument both individually and in groups to ensure understanding regarding boundaries of rating scores for individual items.