Friday, February 8, 2013

A Few Good Games and the Prediction Model

There have been a few really good games I haven't had a chance to address yet.  I also want to talk a bit about the Prediction Model and a change I'm making regarding the Forecast.  First off, there were three really big games this week, the first being Troy @ CBA.  It definitely looked like a rout on paper with CBA winning by solid double digits and it certainly wasn't helpful for Troy to have arguably their best guard suspended for the game, nor is it helpful he's suspended for the rest of the season.  While I now believe CBA has a better chance to win sectionals than does Troy (mostly because of the suspension, which the forecast doesn't account for) the forecast model still shows Troy ahead.

In fact, it still lists CBA at #4 behind Troy, Green Tech and Bethlehem, and while I disagree with the Troy rating, I can't say I'm convinced they are now more likely to win sectionals than either Green Tech or Bethlehem.  The 19 point swing between the Troy win and CBA's win each on their home courts can partly be attributed to home court advantage which would roughly account for 5 to 6 of those swing points.  I don't think anyone knows for certain, but at least part if not most of the remaining difference is due to a regression from Troy due to the loss of a player rather than the ascension of CBA.  That said, if you backed me into a corner, I'd probably take CBA, it's hard not to.

The second big game was from the girls' side as Bethlehem crushed defending Champion Colonie.  After a two point win very early in the season, Bethlehem increased that margin by more than 30 in the second game.  The forecast model has now given Bethlehem a really big margin over Albany and while I do think both of these teams will be in the final, I don't know that I agree there is a large difference between them, however I would give Bethlehem the edge at this point.

The final game was a non-league game featuring former league mates Albany Academy and Watervliet.  This game proved to me the most telling for sectionals.  Watervliet may have lost but were tied late in the 4th quarter with a team that lost to the #1 AA school in the state by only 2 points.  Mekeel Academy can brag about a AA win over Amsterdam, but this is a much more impressive loss to me.  Albany Academy beat Green Tech by 11, who beat Schenectady by 31, who beat Amsterdam by 10.  Very roughly, using that rationale, comparing Watervliet to MCA is like comparing Green Tech to Schenectady.  It's certainly not a fool proof analysis, but it's cemented in my mind what I already believed, that Watervliet is going to run away with the B title and make a run at State's.

Finally, I'm going to hold off on releasing this weekend's forecast until my final forecast on Tuesday night.  With no games tonight it makes more sense to wait until the end of the weekend and if I'm holding off until Monday anyway, I may as well just hold it the extra day.  The final forecast will show the final ratings that I have been talking about all this time and where the thresholds are so you know exactly where everyone stands.  Joe talked me into that with his comments and I thank him for that.  If anyone else cares to provide feedback, it's greatly appreciated.

Also, I have run the prediction model and I'm much more pleased with it than I thought I would be.  When I originally started working on this several years ago, I ran a similar model and it was way too biased.  This one seems to have a more mellowed bias almost to the point where I'm considering using it full time next year.  I'll be unveiling this after the seedings come out.  Look for another post Sunday or Monday depending on how much time I have.


  1. This is what I see now.
    1. Hoosick valley
    2. Maple hill
    3. BKW
    4 Fort Plain
    5 Hoosick Falls
    6. Mechanicville
    7. Mekeel
    8. Duanesburg
    9. Voorheesville
    10. Waterford
    11. Middleburgh
    12. Galway
    13. Lake George
    14. Canajoharie
    15. Greenwich
    16 Schoharie

  2. I understand your pooh poohing the CHVL as it's not a strong league, but how do you put Waterford 10 and Maple Hill 2 when they both played Taconic Hills in 2 point games? I know they lost to Duanesburg, but if you use that logic you would need to put Middleburgh (and Mekeel, who beat Middleburgh twice) ahead of Fort Plain as that was a 20 point game.

    Hoosick Falls played and beat Catskill by 12. Taconic Hills beat them twice by an average of 29.5 points. If TH is 17.5 points better than HF but only 2 points better than Waterford how do you put HF 5 and Waterford 10. One loss doesn't make a season and I'm more willing to write off a bad loss to Duanesburg or Middleburgh (for Fort Plain) given the remainder of their schedules than I am to put Waterford behind a 5-11 Voorheesville team 8 spots away from Maple Hill. Sometimes it's just not clean and easy and the numbers don't support each other.

    Duanesburg might beat Waterford again if they play, but I don't see how their losses to Mekeel, Middleburgh and two to Berne-Knox are more impressive than what Waterford has done. I believe they have earned a higher seed than Duanesburg. And you have every right to disagree.

  3. Hello Christian,

    I enjoy your blog, it is very interesting. I follow the girls more than the boys.

    A couple of questions, do you cap or weight the MOV after a certain number? I don't recall a year when there have been so many blowouts.

    Have you ran the #'s with only same class opponents? I would be curious as to the results.

  4. Hi Michael,

    Thanks for your comment and your interest. I do cap the MOV. Right now, for the girls, I have been using the Boys history as I only have the one year for the girls in the database. It basically caps the average MOV at 28 so even though Waterford is averaging a MOV of 35, they are credited with a 1.000 in the MOV expected win percentage. What I have been doing is eliminating the best and worst teams (roughly 5% so I'm keeping 95%) each year and using the variance between them as the total number of points, which is currently 56 (28 positive and 28 negative). So if your average MOV is 28 or higher you have a 1.000 expected win percentage, while if you have a -28 MOV, your expected win percentage is 0.000. If you have a zero MOV average, you'd be expected to have a win percentage of .500.

    I have considered excluding a larger number but overall I don't think that would have much impact. I've also considered capping each game, but I'm not sure of the logistics of that and if I'd have enough time to do it (since this is my third job and it's currently uncompensated).

    As far as running the numbers on the individual classes, I just don't have enough information at this point. Without separating them I get 5 data points in the regression model each season, but if I run them as each class I only get one. Ideally I would like at least 10 years before splitting them to account for any anomalies. The girls tend to have more dynasty teams, (Averill Park, Hoosic Valley, Voorheesville), where the boys have dynasty leagues (Big 10, Colonial). This is one reason I have been strongly considering using the Prediction model in place of the forecast model, because I have more years of sectional games than I do full seasons and it helps differentiate the classes a bit without being too biased. Again, however, it's just so much more difficult to research the girls than the boys and I'm missing some games in 2010 sectionals so I would only have one full year's model coming into this post season (I currently can go back to 2006 for the boys).

    If I can justify spending the time and the blog makes it to the 2016-2017 season I'll run the individual classes for the girls and see what we get. I could probably run the boys in the 2014-2015 season. Thanks again for reading.