Multilevel Modeling of Resection Accuracy: Insights from 10,144 Clinical Cases using A Contemporary Computer-Assisted Total Knee Arthroplasty System (Abridged Version)

Clinical Contributors

James I. Huddleston III, MD
Stanford University Medical Center
Redwood City, CA

Bernard N. Stulberg, MD
Saint Vincent Charity Medical Center
Cleveland, OH

Technical Contributors

Laurent Angibaud, Dipl. Ing.
Exactech, Inc.

Charlotte Bolch

Exactech, Inc.

Yifei Dai, PhD
Exactech, Inc.

Cyril Hamad, Dipl. Ing.
Blue Ortho

Amaury Jung, Dipl. Ing.
Blue Ortho


As a successful treatment for advanced inflammatory and degenerative knee arthritis, total knee arthroplasty (TKA) is projected to expand by 600% to more than three million cases annually by 2030.1 Associated with the exponential growth, an expected increase of revision TKA cases can be a substantial financial burden to both patients and society. Inaccurate surgical resections and the resultant malalignment are among the most common reasons for TKA failure.2 These etiologies may also contribute to the phenomenon of 20% “unhappy” patients,3,4 as they have been shown to lead to worse functional and clinical outcomes compared to those of well-aligned knees.5-8

Numerous studies have confirmed the benefit of computer assisted orthopedic surgery (CAOS) in improving the accuracy of bony resection and limb alignment.9,10 However, there are some common shortcomings shared across the existing studies that often fall into the following categories: 1) the studies are not sufficiently powered to investigate geographic and inter-surgeon variance; 2) limited data is available on the ”learning curve” to gain full benefit of a technology with a surgeon’s early adoption of the technology; 3) longitudinal performance of a specific CAOS system over time has been overlooked, despite constant updates in the software and hardware as a standard practice in most marketed systems; and 4) even though published meta-analyses offer global reviews of the CAOS technology, by nature, device differences and associated technical variations are excluded from the analyses, leading to significant differences in accuracy reported between CAOS systems.11

It is unquestionably difficult to initiate clinical studies that encompass sufficient cases for the assessment of individual factors that may influence accuracy. Current cloud-based infrastructure now allows the archiving of technical data without the need to assess specific patient information, enabling comprehensive accuracy assessments based on a large number of cases performed by a given CAOS system. However, the large dataset is often accumulated at multiple levels (hierarchically structured), posing a unique challenge for analysis as it may violate the assumptions of common analytic methods such as linear regression. Multilevel modeling offers several advantages to address the challenge,12 including: 1) no requirement of independence for individual observations; and 2) effects of both individual and specific groups can be analyzed against the outcome of interest comprehensively and concurrently. This methodology has been applied to assess healthcare data variations in multiple categories, such as geographic region, socioeconomic status, and different attributes in care networks based on large datasets.13-15

By integrating the above described concepts of CAOS system specific accuracy performance, consideration of multiple factors that may impact accuracy, and methods for analyzing hierarchical data, this study aimed to apply multilevel modeling to assess resection accuracy across the entire TKA application history of a modern CAOS system. Specifically, the authors sought to determine the impact on accuracy from 1) geographic region; 2) inter-surgeon difference; 3) surgeon’s adoption of the technology (learning vs proficiency); 4) preoperative mechanical alignment status; and 5) historical progression of the CAOS application (software versions).

Materials and Methods

A retrospective review was conducted based on a proprietary cloud-based web that archives all TKAs performed using a modern imageless CAOS system (ExactechGPS®, Blue-Ortho, Gieres, FR). All completed cases are stored as deidentified reports that contain only technical information on the surgery (no patient information of any sort). Similarly, all surgeons are de-identified with only their geographic information (country of practice) available. A set of grouping categories were identified as variables that might affect alignment accuracy, including geographic regions, inter-user differences across established surgeons (surgeons with at least 50 cases experience with the CAOS system), adoption phases, preoperative mechanical alignment status, and versions of the CAOS software application (Table 1).

Table 1. Grouping variables for the assessment of variability.

*The selection of ≥ cases to define established surgeons was based on consideration of maintaining sufficient sample size per category.

The following surgical parameters were extracted (Figure 1): 1) planned resection: the resection parameters determined by the surgeon prior to the bony resection. These parameters reflected the surgeon’s resection targets for the CAOS guidance; 2) checked resection: digitalization of the actual bony resection surfaces, acquired based on the actual bony resection using an instrumented checker.

Figure 1. A) Alignment target was planned before bony resection, and B) digitized after bony resection using an instrumented checker. Resection error in alignment was calculated as the deviation from the planned resection to the checked resection.

Resection errors (accuracy) were assessed between the planned and actual resections in the coronal plane referencing the mechanical axis, for both the tibia and femur. A resection was considered acceptable if there was no more than 2° of error. Unconditional multilevel modeling was applied to understand whether and where the variability was located in the resection errors in both tibia and femur with regard to the grouping categories. For each model, level-1 and level-2 variances, as well as the intraclass correlation (ICC) were computed. Specifically, the following questions were explored:

  1. Does significant variability exist in resection errors in any grouping category(-ies)?
  2. If variability is found to exist in a grouping category, is it clinically meaningful?

The first question was answered by the identification of any significance (p < 0.05) from a z test on the variance estimate of the level-2 variability related to a specific grouping category. In order to answer the second question, an intraclass correlation coefficient (ICC) value greater than the common variability from observational type studies (reported as ICC = 0.15 – 0.2516) indicated the existence of meaningful variability in alignment accuracy for the associated grouping category.


A total of 10,144 CAOS TKA cases were reviewed. Overall, the percentages of cases with acceptable coronal alignment were 97.9% and 97.2% for the tibia and femur respectively. The alignment results exhibited excellent accuracy across all established surgeons (acceptable resections ranged between 92% and 100% of the cases, Figure 2). For both tibia and femur, greater than 95% of the cases exhibited acceptable resections across geographic regions, adoption phases, preoperative alignment categories, and CAOS application versions (Figure 3).

Figure 2. Percentage of acceptable resections (<2° alignment error) across individual surgeons.

Figure 3. Percentage of acceptable resections (<2° alignment error) across grouping categories of geographic region, adoption phase, preoperative alignment, and application version.

Variation in geographic region, CAOS software application versions, preoperative alignment, and adoption phases (learning/proficient) all exhibited negligible amounts of total variability in resection errors for both tibia and femur (insignificant z tests on level-2 variance estimates, ICC values < 0.004, Table 2). Although significant variability was found among individual surgeons (p values ≤ 0.001), the associated ICC values (0.02 and 0.07 in tibia and femur resection errors, respectively) were lower than the common variability from observational type studies.16

Table 2. Variance estimates and ICC values for level II variables from multilevel models. Note that extremely low (0.0000) variance estimates were found across categories in some group variables (meaningful standard error not observed). The associated z-value and p-value were not calculated as the data did not support a hypothesis test (z-test).


Numerous studies have shown that malalignment in the coronal plane can lead to various complications in TKA, such as component loosening and instability, polyethylene wear, and patellar dislocation.5-7 Despite the consensus on the importance of alignment accuracy, only 70–80% of the conventionally instrumented TKA cases can achieve satisfactory lower limb alignment (within 3° of varus/valgus relative to the mechanical axis).17,18 In contrast, this study demonstrated excellent accuracy in bony resection alignment achieved with the modern CAOS system studied. Furthermore, the resection accuracy was not sensitive to geographic region, inter-surgeon difference, learning period, preoperative mechanical alignment status, or CAOS software application version.

To the authors’ knowledge, this is the first data analysis applying advanced statistical modeling to assess the accuracy of a specific CAOS system across all cases in its application history, comprehensively considering factors that may influence the bony resection alignment. All, not just selective, surgeons, geographic regions, preoperative alignment conditions, software versions, and phases of adoption were assessed, making this analysis a robust and unbiased review of the accuracy performance of this CAOS system.

Researchers have questioned the accuracy of limb alignment measures based on standard long-leg standing load-bearing radiographs, as it may be compromised by the quality of the image, inter- and intra- observer variability, as well as the rotation of the limb or oblique direction of the beam. Although three-dimensional computer tomography (CT) analysis is suggested for a more accurate alignment measurement,19 a universal CT evaluation for all patients in this study was impractical. The choice of intraoperative instrumented measurement provided a consistent and accurate 3-D method for assessing resection alignment accuracy.

In conclusion, this study applied an advanced statistical tool to provide a comprehensive, clinically relevant evaluation of a modern CAOS system for total knee arthroplasty. The analysis considered potential impact from an extensive list of factors for a thorough understanding of resection errors based on a large data set collected through the application history of the system. The analysis outcomes demonstrated that the studied modern CAOS system offers an accurate and precise solution to help the surgeon achieve their surgical resection goal.

*The full version of this article can be found in Volume 27, Issue 3 of The Knee.


  1. Kurtz SM, et al. Projections of primary and revision hip and knee arthroplasty in the United States from 2005 to 2030. J Bone Joint Surg Am 2007;89(4):780–5.
  2. Schroer WC, et al. Why are total knees failing today? Etiology of total knee revision in 2010 and 2011. J Arthroplasty 2013;28(8 Suppl):116–9.
  3. Bourne RB, et al. Patient satisfaction after total knee arthroplasty: who is satisfied and who is not? Clin Orthop Relat Res 2010;468(1):57–63.
  4. Scott CE, et al. Predicting dissatisfaction following total knee replacement: a prospective study of 1217 patients. J Bone Joint Surg Br 2010;92(9):1253–8.
  5. Choong PF, Dowsey MM, Stoney JD. Does accurate anatomical alignment result in better function and quality of life? Comparing conventional and computer assisted total knee arthroplasty. J Arthroplasty 2009;24(4):560–9.
  6. Blakeney WG, Khan RJ, Palmer JL. Functional outcomes following total knee arthroplasty: a randomised trial comparing computer-assisted surgery with conventional techniques. Knee 2014;21(2):364–8.
  7. Huang NF, et al. Coronal alignment correlates with outcome after total knee arthroplasty: five-year follow-up of a randomized controlled trial. J Arthroplasty 2012;27(9):1737–41.
  8. Longstaff LM, et al. Good alignment after total knee arthroplasty leads to faster rehabilitation and better function. J Arthroplasty 2009;24(4):570–8.
  9. Brin YS, et al. Imageless computer assisted versus conventional total knee replacement. A Bayesian meta-analysis of 23 comparative studies. Int Orthop 2011,35(3):331-9.
  10. Hetaimish BM, et al. Meta-analysis of navigation vs conventional total knee arthroplasty. J Arthroplasty 2012,27(6):1177-82.
  11. Carli A, et al. Inconsistencies between navigation data and radiographs in total knee arthroplasty are system dependent and affect coronal alignment. Can J Surg 2014;57(5):305–13.
  12. Osborne JW. Advantages of hierarchical linear modeling. Pract Assess Res Eval 2000;7(1):1–4.
  13. Sizmur S. Multilevel analysis of inpatient experience. Report from Picker Institute Europe; 2011 March. https://www.picker.org/wp-content/uploads/2014/10/ Multi-level-analysis-of-inpatient-experience.pdf.
  14. Lumme S, Leyland AH, Keskimaki I. Multilevel modeling of regional variation in equity in health care. Med Care 2008;46(9):976–83.
  15. Uddin S. Exploring the impact of different multi-level measures of physician communities in patient-centric care networks on healthcare outcomes: a multi-level regression approach. Nat Sci Rep 2016;6(20222):1–10.
  16. Hedges LV, Hedberg EC. Intraclass correlation values for planning group-randomized trials in education. Educ Eval Policy Anal 2007;29(1):60–87.
  17. Ritter MA. The anatomical graduated component total knee replacement: a long-term evaluation with 20-year survival analysis. J Bone Joint Surg Br 2009;91:745–9.
  18. Bourne RB, et al. Patient satisfaction after total knee arthroplasty: who is satisfied and who is not? Clin Orthop Relat Res 2010;468(1):57–63.
  19. Ueyama H, et al. Two-dimensional measurement misidentifies alignment outliers in total knee arthroplasty: a comparison of two- and three-dimensional measurements. Knee Surg Sports Traumatol Arthrosc 2019;27(5):1497–503.