In simulated trials of 50 patients, the machine-learning model designed treatment cycles that reduced the potency to a quarter or half of the doses It often skipped administration, which were then scheduled twice a year instead of monthly.
Reinforced learning was used to teach the model to favor certain behavior that lead to a desired outcome. A combination of temozolomide and procarbazine, lomustine, and vincristine, administered over weeks or months, were studied.
As the model explored the regimen, at each planned dosing interval it decided on actions. It either initiated or withheld a dose. If it administered, it then decided if the entire dose, or a portion, was necessary. It pinged another clinical model with each action to see if the the mean tumor diameter shrunk.
When full doses were given, the model was penalized, so it instead chose fewer, smaller doses. According to Shah, harmful actions were reduced to get to the desired outcome.
The J Crain Venter Institute’s Nicholas Schork said that the model offers a major improvement over the conventional “eye-balling” method of administering doses, observing how patients respond, and adjusting accordingly.