Researcher: 1
Data Scientist: 2
Monday, August 25, 2025

10:22 — 2:
Hey, @1
We tested the lib in an initiative we have here for a scheduled retraining.
File sent: pix_case.html
Pretty interesting.
We noticed that even a more recently trained model doesn’t show statistical difference.

10:24 — 1:
Good morning!
Wow, that’s great news @2, I’m so happy about that!
Was the usage clear enough?
Did you have any trouble applying it, or did it go smoothly?

10:25 — 2:
Yes!
Zero difficulty!!

10:25 — 1:
Awesome!!!
Just curious, at what point do you guys run the test and generate the HTML?

10:52 — 2:
We fitted two models.
The idea is to fit more, we’re just waiting for the runs to finish.
One model was trained with data up to XX-XX, and the other with data up to YY-YY.
Then we tested whether one performed better than the other because it had more recent data.
The goal is to check if we actually gain anything from having scheduled retraining.

=====================================================================================================

Researcher: 1
Data Scientist: 3

Monday, August 25, 2025

12:07 — 1:
You add both test datasets,
and then when adding the context, you link the model and the corresponding test dataset (using its identifier name).
So it worked out?

12:08 — 3:
Yep, it worked great — thanks!!

12:08 — 1:
Nice! Did you find it intuitive that way?
I spent so much time trying to make that process reasonably smooth hahaha
And flexible enough for scenarios with different features.

12:09 — 3:
Dude, you nailed it.
It took me a bit to get used to it, but that’s on me, I’m a terrible user and didn’t read the docs hahaha

12:09 — 1:
No worries hahaha, but I’m glad it worked anyway, that actually shows it’s intuitive if you managed to use it without the docs!

12:10 — 3:
I was just about to say that hahaha
The only hiccup I had at first (which I quickly figured out) was that I passed the whole pipeline as the model hahaha

12:10 — 1:
Ah, yeah, passing the full pipeline isn’t supported yet.
That’s definitely a good area for improvement, since I get that testing full pipelines with multiple models could be interesting.

12:11 — 3:
Yeah, I think using the model directly makes it more flexible, lots of people don’t save pipelines yet hahaha

12:11 — 1:
Did the generated report make sense?

12:11 — 3:
I’m still analyzing it carefully because there are lots of cool tests,
But yeah, it totally makes sense to switch models, as the results show.

12:11 — 1:
Awesome, that feedback means a lot to me.
Thanks a lot for testing the tool I built, too!
Was it a classification scenario?

12:11 — 3:
Yes, it was, credit card fraud detection.
The previous model has been in production for over a month, so it’s naturally outdated.

12:11 — 1:
Very cool!

12:12 — 3:
Thanks again, I really loved using the tool!
I’m even going to use it in my repo to help with model update decisions.