The World’s First Machine Learning-Derived Outcome Measure

We are all aware of the legacy scores that we use to determine patients’ success after shoulder arthroplasties such as ASES, Constant, UCLA, etc., and each of these legacy scores collect some sort of subjective or objective measurements. The purpose of these scores is to assess patient progress from the pre- to post-operative time points, as well as long-term outcomes.


Stephanie Muh, MD
Deputy Chief of Service in the Department of Orthopaedics
Henry Ford Hospital West Bloomfield

We also use these scores to compare treatment arms, such as comparing the reverse total shoulder arthroplasty to total shoulder arthroplasty. However, there is no gold standard for scoring for shoulder arthroplasty specifically, and none of these legacy scores offer predictive value. Therefore, we cannot tell a patient that if they have a Constant score of this “x”, that they can expect to have a post-operative outcome that is good or great.

The ASES, Constant, and UCLA scores are the three most common shoulder outcome measures used today. But because they are not shoulder arthroplasty specific, they add little value to us. However, if we look at individual points within these scores, they could be very important to help us create a patient-derived predictive model.

Additionally, because there is no gold standard, the National Institutes of Health (NIH) has now funded the development of a patient-reported outcome measure: the PROMIS score. The PROMIS measures patient-reported health status for physical, mental, and social well-being. Nonetheless, this score only measures generalized health and is not shoulder-specific.

For this reason, a novel machine learning-based Shoulder Arthroplasty Smart (SAS) score has been developed. The formation of this SAS score utilized over 250 data points, which includes the legacy PROMIS scores, patient demographics, and objective range-of-motion scores. With this data, machine learning algorithms were utilized to come up with two scores: the F – Score, which tells us how important a specific data point is, and the Reciprocal Fusion Rank Score, which confirms that the data point is actually useful.

Then, we were able to identify several measures that were identical across the ASES, Constant, and UCLA scores to develop a new SAS score. This includes three objective range-of-motion measurements, three subjective patient and pain function scores, and a calculated functional range-of-motion score.

Minimum of two-year follow-up
Post-operative visits

To validate the score, we evaluated more than 3,500 patients with a minimum of two-year follow-up. This included more than 8,000 post-operative visits excluding patients with fractures, revisions, and hemiarthroplasty. We then compared this new scoring system to our five legacy scores.

To check the validity, we did a floor and ceiling analysis, and what we found was that the Constant score has a preoperative floor effect while the other four legacy scores had post-operative ceiling effects. This means that once a patient reached a certain improvement post-operatively with these legacy scores, they were no longer able to detect any further improvement. So, these legacy scores may not actually reflect patient outcomes. However, the SAS score had no floor or ceiling effect.

Secondarily, we wanted to determine if the floor or ceiling effects had any bias in the legacy scores. What we found was that the ceiling effect was more common in males, different for race ethnicities, as well as for different ages. Again, the SAS score had no ceiling effect.

Next, we wanted to verify if the SAS score was able to be correlated with the other legacy scores. What we found was that it is highly correlated ̶ meaning that the same patient gets similar relative scores on all metrics.

Once we verified the validity, we wanted to check the responsiveness. Does the SAS score adequately detect changes appropriately? While the UCLA score was the most responsive, the SAS score came in second with the ASES score. As a result, we felt that the SAS score does detect change adequately.

Lastly, we wanted to find out if this new score was clinically relevant. The SAS score does have a similar MCID as well as SCB as the Constant score. And based on this, we do feel this is a useful new score.


In conclusion, this is the largest study to quantify psychometric properties of the clinical shoulder arthroplasty outcome measures. The SAS score is a simple, six-item scoring system that has presented as valid, responsive, and clinically useful.

The legacy scores, on the other hand, do show either a floor or ceiling effect that limit their validity. They also show age, gender, and race or ethnicity bias. Therefore, we feel that the legacy scores have an insufficient response range for patient outcomes.

The SAS score is the first clinical outcome metric derived using machine learning. Based upon these positive results, which have been previously published, we recommend the novel Shoulder Arthroplasty Smart (SAS) score to quantify outcomes for shoulder arthroplasty patients.

And I will note, we are aware that the calculated composite range of motion score can be difficult to do. Therefore, we do have a website available to surgeons. Please visit us at www.smartshoulderscore.com.

In this Masters Course video, Stephanie Muh, MD, reviews the science behind the new Shoulder Arthroplasty Smart Score.

Stephanie Muh, MD, is deputy chief of service in the department of orthopaedics at Henry Ford Hospital West Bloomfield where she specializes in shoulder and elbow reconstruction, rotator cuff repair and arthritis. Dr. Muh completed her residency in orthopaedic surgery at the Henry Ford Hospital and shoulder and elbow fellowship at Case Western Reserve University/University Hospitals of Cleveland.

For additional content on machine learning, read our blog post titled Using Machine Learning to Predict Patient Outcomes. To access our library of resources for tools and techniques that can improve patient outcomes, visit the landing page of our Innovations Blog.