Results tagged ‘ statistics ’

Mayor’s Race Analytics

In recent years, sabermetrics have revolutionized the study of baseball and other sports. Many other fields have also been influenced by statistical analysis, including politics and elections, to name a couple. But somehow, one very important area has been overlooked by the emerging field of analytics: politicians racing at sporting events.

Until today, that is. With most of the 2011 season in the books, we have enough data to properly analyze the Old Brick Furniture Tri-City Mayors Race, which takes place during the sixth inning of every ValleyCats home game. These new analytical methods will provide critical new insights on the three mayors’ strengths and weaknesses and may help us more accurately predict the final few races.

First, let’s review the current standings:

  1. Brian Stratton, Schenectady, 13
  2. Harry Tutunjian, Troy, 10
  3. Jerry Jennings, Albany, 8

Mayors Tutunjian (center) and Stratton (right) fight for the lead while Jennings lags behind, a rather common sight this summer.

Former Schenectady Mayor Brian Stratton has led the standings for most of the season, and with only five races to go, he seems likely to keep it that way – one more win would clinch at least a share of the title at the season’s end.

But wins and losses tell only a fraction of the whole story. So much more happens during each race before the first mayor reaches the finish line, and all of it is valuable information that we can use to further our understanding of each runner. Let’s explore!

Getting started

If you’ve been to “The Joe” a couple times, you’ve likely seen at least one memorable comeback. After all, the 250-foot track can be grueling for these mayors, and anyone who gets off the blocks too quickly may fade down the home stretch. As the contestants jockey for position before the right-field gate opens, you may wonder, does the opening of the race even matter?

Indeed it does. The mayor who leads out of the gate has held on to win the race nine out of 20 times this season.* This certainly is not a prohibitive advantage – so don’t despair if your favorite politician is straggling down the right-field line early on – but it is clearly better than the one-in-three rate we would expect from random chance.

*In the other 11 races, data was not recorded or the leader was too close to call.

Troy Mayor Harry Tutunjian is best at getting out in front – he has taken the early lead in half of these 20 races – but he is also the worst at capitalizing it, winning just 30% of the time that he starts in first. (Tutunjian has actually won a slightly higher percentage of the races that he has not led initially, suggesting that fighting for pole position may not be worthwhile for him.)

Albany Mayor Jerry Jennings has won three of six races with the lead, bringing us to…

Sabermetrics of Stratton

The secret to Stratton’s success is not his performance off the blocks – he has taken the initial lead in only four races, the fewest of any mayor. Instead, like a shorter and slightly slower Usain Bolt, his strength is his ability to pull away from the pack. When Stratton does get an early lead, he is nearly unbeatable – he has lost only once after starting in first place. (Even that race would have been another Stratton victory had he not made the ill-fated decision to turn around and showboat near the end, allowing Tutunjian to pull off a stunning – if karmic – comeback.)

Stratton has the fastest top speed – he ran a blazing 13.92 in mid-July, the best time of any mayor this year by nearly a full second – but a deeper look reveals some very mixed results. He has gone a nearly unthinkable 6-0 in races decided by less than .3 seconds, and while defenders will argue that the pride of Schenectady “just knows how to win,” his performance in close contests is an indicator of good luck. Those wins are in hand – and the biggest reason why he will almost certainly take the end-of-season crown – but he has most likely been racing above his true talent level so far.

Stratton’s average race time this year is 22.05 – nearly a full second slower than Tutunjian’s.

Hard-luck Harry

That will probably come as little solace to the mayor of Troy, who has often played the role of Samuel Tilden to Stratton’s Rutherford Hayes – tantalizingly close but ultimately a loser. Tutunjian has lost four races by less than .1 seconds – and four others within half a second – naturally leading to the most second-place finishes.

And when Tutunjian has won, it has often been by Reagan-like margins. His average margin of victory is 1.1 seconds, nearly twice as high as any other candidate’s. Not even included are five other races in which Tutunjian led by so much that the other two candidates did not even cross the finish line – Jennings and Stratton have only three such victories combined.

With an average race time of 21.10 seconds, easily the fastest of the three contestants, Tutunjian has proven his consistency. Unfortunately, a series of disappointing photo finishes has relegated him to second place – and in the words of the immortal Ricky Bobby, if you ain’t first, you’re last.

Losing focus

Bringing up the rear of the standings is Albany Mayor Jerry Jennings. Unlike Tutunjian, Jennings can’t blame bad luck for his position. Rather, he appears to be suffering a lack of concentration.

It’s natural for runners to break off a race sometimes – if you don’t feel you have a chance of winning, you might as well slow up and save your energy. But Jennings has been too willing to give up on competitions. While Tutunjian and Stratton have failed to finish 13 times between them, Jennings has accumulated 12 DNF’s on his own.

Though this might be excusable if it were part of a well-planned strategy, Jennings instead simply seems prone to all sorts of distractions. He has been sidetracked by a fight, a shoving match and even a lightsaber battle (on Star Wars Night, of course). Though it makes a great spectacle to entertain the fans, these incidents cost Albany’s mayor valuable time and are a big factor in his poor record.

If you’re looking for positive things to say about Jennings as a runner, start here: he’s a ‘mudder.’ Jennings has won three of the four slowest races this year, and will doubtlessly try to slow things down again this week in an effort to catch Tutunjian for second place.

As the mayors showed off for the fans in June, Jennings took the inside track for his first win of the season.

Dog days

After anonymous reports that an unnamed mayor had been taking illegal substances to gain an edge on his competitors late in the inaugural 2010 season, the ValleyCats instituted a tougher testing program to crack down on the mayors’ excessive use of caffeine and other stimulants. If the numbers are any indication, the testing has worked.

Check out this chart of the times throughout the season:

As you can see, the contestants are slowing down significantly as the calendar turns. As the best-fit line shows, the average time has increased by .22 seconds with every race, a substantial decrease in speed over the course of a season.

This fatigue manifests itself in other ways as well. In the first four homestands, the mayors were much more likely to finish races than in the last four. This difference is not quite statistically significant (p=.12), but it tells the same story:

These data suggest that players aren’t the only ones who get worn down by a long summer – and, unlike players, the mayors can’t be spelled by a reserve for a day’s rest.

But they’ve been off for eight days now, and the trio should be fresh for the home stretch. Will Tutunjian’s luck change enough for him to overcome Stratton? Will Jennings turn around a rough season? Will Stratton keep winning nailbiters? Watch the final five races at “The Joe,” starting tomorrow, to find out.

Kevin Whitaker

What have we learned?

The season is 30 percent complete, and the team is coming off its first official off day. So let’s step back a bit and take a look at what we’ve learned about this year’s ValleyCats so far:

The starting rotation is good. Euris Quezada has not had the best start to the season, going 0-3 with an 8.83 ERA, but the other four-fifths of the rotation has been anywhere from good to excellent. Juri Perez has the highest ERA of the four at 3.55, and this doesn’t feel unsustainable – all four of these pitchers have the stuff and command to be very good at this level. If the ‘Cats can get the fifth spot figured out, it wouldn’t shock me in the least to see this rotation go on a run like the 2010 team did last August, when all five starters had an ERA below three for the entire month. Now that players have had a few starts under their belt, Tri-City and other teams will be more willing to let their starters go into the sixth and seventh innings, which will magnify the Cats’ starting pitching advantage.

The star of the rotation so far has been Kyle Hallock, who has completed at least five innings in every start and has yet to allow more than two earned runs. Anytime you’re among the league leaders in K/9 and BB/9, as Hallock is entering tonight’s start at Batavia, you’re doing something right. The southpaw has 25 strikeouts against two walks, the best such ratio in the league so far, and ranks fourth with a 0.78 WHIP.

If there’s one candidate for regression among the Cats’ top four starters, it may be Jonas Dufek. Check out these splits: with nobody on base, opponents are hitting .410/.500/.645 off Dufek. But with men on, he becomes “Jonasty,” holding hitters to a .158/.200/.211 line. And with men in scoring position? .114/.184/.200. In a nutshell, Dufek has allowed lots of runners to reach base but has pitched extremely well under pressure. That’s great to see from a mental standpoint, but it’s not likely to be sustainable over a full season – if runners keep reaching base, hitters will eventually get lucky and have bloopers or line drives fall in critical situations, and runs will score. (Of course, leadoff batters aren’t likely to keep getting on base 64 percent of the time either, so it all may even out.)

DIPS likes the pitching staff even more. The ‘Cats have done well in all of the “three true outcome” categories – the team ranks fifth in strikeout rate (K/9), fourth in walk rate and fourth in home run rate allowed. Though they rank sixth in ERA, I have them third in the league in FIP (Fielding-Independent Pitching). The difference can be explained by a .323 batting average on balls in play, the third-highest in the NYPL.

Now, a major caveat here: when discussing major-league pitchers, BABIP has been shown to have very little predictive value for pitchers – that is, what happens to a ball in play is mostly due to factors that are outside the pitcher’s control. This is not necessarily true for minor-league pitchers. Minor-league players – especially at a low level such as the NY-Penn League – are very different than major-league pitchers, and it would be reasonable to think that some minor-league pitchers consistently throw pitches that are more likely to go for base hits. (These pitchers would usually be weeded out before reaching the majors.)

In short: while the strong fielding-independent statistics and the high BABIP do suggest that the pitchers have been unlucky (and/or that the defense behind them has been poor), the evidence for that is not as strong as similar major-league numbers would be.

The offense needs improvement. This isn’t as clear-cut as you might expect: the ‘Cats actually rank eighth in the league with 4.43 runs per game, though they’re closer to eleventh (Brooklyn) than seventh (Hudson Valley). What’s not obvious is how exactly they’re doing it. Tri-City ranks 12th in batting average (.236), 12th in slugging percentage (.326) and tied for 10th in on-base percentage (.319), a profile that doesn’t usually lead to a league-average offense.

Only one team has left fewer runners on base than the ‘Cats. You could make a convincing argument that the ValleyCats are one of the better baserunning teams in the league, and generally good lineup construction has helped, but it’s hard to escape the feeling that some of this simply comes down to the team getting timely hits at a rate that may not be sustainable.

Plate discipline is not the problem. It feels like batters have watched a lot of third strikes go by at Joe Bruno Stadium this year, and fans of every team feel like their hitters strike out too much, but the ValleyCats’ problem is not their pitch recognition. The ‘Cats are striking out in a tick under 18 percent of their plate appearances, one of the best marks in the NYPL and well below the league average of 20 percent. They have drawn 83 walks against 155 strikeouts, the third-best ratio in the league.

But the ‘Cats just aren’t doing enough when they make contact. Despite playing in Joe Bruno Stadium, recently the league’s best home run park, Tri-City ranks dead last in the league with six dingers, even after hitting three in its last two games. I’d expect a better showing than that in the final 53 games – powerful hitters like Brandon Meredith and Kellen Kiilsgaard will hopefully return to the lineup, and guys like Zach Johnson and Miles Hamblin have shown the potential to hit for more power than they have so far – but this isn’t an offense that will be having too many one-swing rallies.

These outfielders can throw. Okay, we knew that from the start. Drew Muren leads the league with five outfield assists, and Justin Gominsky is tied for second with four. As a team, the ‘Cats have a league-best 11 outfield assists in 23 games, which the pitching staff must love.

Guess what? The ValleyCats have been unlucky. At this time last year, the ValleyCats were 9-14, but they had scored roughly as many runs as they had allowed. I argued that they would play better for the rest of the season, and sure enough they did, greatly surpassing even my expectations.

Well, it’s a year later, and the ValleyCats are 9-14. And guess what? They’ve only been outscored by two runs (104-102). Run differential is a better predictor of future performance than wins and losses. It certainly doesn’t mean another miraculous playoff run is coming – and a slew of difficult opponents in the next two weeks won’t make it easy for the ‘Cats to make a charge soon – but it means we should expect them to play more like a .500 team for the rest of the season than a .400 team. (14-8 Vermont, incidentally, has outscored its opponents by only one run, meaning the Lake Monsters could come back to the pack in the Stedler Division.)

So although 2011 hasn’t started the way the ValleyCats and their fans would have liked, we could still see some good baseball at “The Joe” over the final seven weeks of the season.

Kevin Whitaker

Three-Horse Race

Playoff Odds update, through 8/31 games: ‘Cats 57%, Vermont 16%, Connecticut 27%

Well, this week didn’t go quite as expected.

After the ValleyCats defeated Connecticut for the fourth time in seven days on Tuesday, it looked like a two-horse race in the Stedler Division: the ‘Cats were hot, Vermont was treading water and Connecticut was fading quickly, three games out.

But everything went right for the Tigers after that. They swept a three-game set at Vermont, and the ‘Cats were swept at Hudson Valley. The Tigers, who had the league’s worst offense entering the series, dropped 21 runs on the Lake Monsters and won the last two games handily. Now they are right back at the top of the division, with momentum and six home games coming up, tied with Vermont and a half-game ahead of Tri-City. It is officially a three-team race.

Momentum would seem to make Connecticut the current favorite, but we just saw how quickly momentum can change. The Tigers now host the juggernaut that is the Brooklyn Cyclones – who, coincidentally enough, will then finish the season with five games against Vermont and three with Tri-City – before finishing with Aberdeen. Vermont still has to play eleven games in the final nine days (two makeups with Brooklyn, although one is the completion of the contest that was suspended in the 12th inning last week) – and, worse, all eleven will be on the road.

The ValleyCats have the easiest remaining opponent of the group when they travel to Lowell next week, but they’ll have to get through three more games with the Renegades, not exactly the team they wanted to see right now. Hudson Valley has won four straight and recently passed the ‘Cats for the fourth-best run differential. The Renegades match up well with the Tri-City offense: the ‘Cats are a very patient bunch, but Hudson Valley has allowed the fewest walks and hit the fewest batters this season.

Ultimately, the ValleyCats still look like the slight favorite, based on the schedule and (more importantly) their play to date. Although they are in third place, they have easily the best run differential of the group (TC +24, VER -12, CT -33), which means they should be expected to play the best from here on out. But time is running out, and they’ll probably need to beat Hudson Valley a couple times this weekend to remain the favorite.

Updated playoff odds:

Tri-City: 41%
Vermont: 33%
Connecticut: 26%

Kevin Whitaker

Updated Playoff Odds

I’ll update this post with the current odds daily. 

Tuesday marked a turning point, as my system now sees the ‘Cats as the favorite.  The rainout hurt Vermont, which won’t get to make up its game with Lowell, while the ValleyCats pretty much knocked Connecticut out of the race by completing another sweep.

Through games of 9/3:

Tri-City: 27%
Vermont: 0%

Connecticut: 73%
=====

Warning: If you don’t like numbers, you won’t find much in this post (or the next) worth reading.

This is an update to my playoff odds post from earlier this week. I’ve corrected a few misconceptions regarding tie-breakers and makeup games and made my model a bit more robust.

The biggest error I had was regarding makeup games for early-season rainouts. For some reason, I was under the impression that rained-out games would be replayed at the end of the season if they affected the pennant race. That is not the case. The ValleyCats’ rained-out games with Jamestown from July and with Aberdeen today will not be played, nor will Connecticut’s game with Staten Island today. Vermont has missed three-plus games so far, but can make up the Brooklyn and Hudson Valley games, plus yesterday’s suspended extra-inning Brooklyn game, because they play those teams again this year. Its rained-out game against Batavia, however, will not be replayed.

The other place I errored was with tie-breakers. I assumed that ties would be broken with a head-to-head game, but that is not the case. Instead, the tiebreakers go as follow: winning percentage, then divisional record, then run differential. It is rare that a tie will go even that far – I predict only a 0.5% chance that run differential comes into play.

The last update is an improvement to my model: an adjustment for home-field advantage. Home teams this year are 226-191 (.542), making home-field advantage a fairly significant factor. Thus, I gave teams playing in their home park a 4% boost in each game*. Note that I said “playing in their home park” – some of the makeup games (i.e., Vermont vs Brooklyn) will not be played at the same place they were initially scheduled; therefore, Vermont will be playing as the “home” team in Brooklyn. In such a case, Brooklyn would get the home-field advantage bump – research has shown that it is playing in a familiar park, not having the last at-bat, that provides the home team with an advantage.

*VERY technical note: In an ideal world, this home-field boost would not be linear – it has a smaller effect with a more lopsided matchup. To illustrate with an extreme example, if a team had a 4% chance of winning on a neutral field, we would not expect it to have a 0% chance of winning on the road. But I couldn’t figure out an easy way to make this effect non-linear, and I expect that all realistic matchups – certainly the ones that I am predicting here – are evenly-matched enough that it doesn’t make much of a difference.

(If anybody is following my work closely, I gave Vermont a 35% chance of winning its suspended game against Brooklyn, down 8-7 with two on and one out in extras. I got that number from this win probability table.)

The numbers:

Tri-City: 41.6% 

Vermont: 53.7% 
Connecticut: 4.6%

The ValleyCats continue to improve their playoff hopes. Keep in mind that these numbers still don’t take into account momentum – June games count as much as August games do. If you think recent results should carry more weight, you should give the ValleyCats a somewhat better chance than listed here. (For what it’s worth, I do think recent results should count more, given how much rosters and players change in this league, but I haven’t come up with a good way to separate recent performance from schedule effects.)

Kevin Whitaker

ValleyCats Playoff Odds

Note: I have learned that some of my assumptions regarding tie-breakers and makeup games were inaccurate.  I’ll update later today with those revised.

Update as of Sunday afternoon: Each team split its last two games, and the playoff results predictably changed little.  I have the ValleyCats at 36.13%, Vermont at 51.40% and Connecticut at 12.46%.

===
We know the ValleyCats are in the playoff hunt. Tri-City is 1.5 games behind Vermont and a half-game back of Connecticut in the Stedler Division, playing its best baseball as we head towards the home stretch. But what kind of chance do the ‘Cats really have of reaching the postseason?

I put together a quick-and-dirty simulation for the rest of the season in an attempt to answer that question. I’ll try not to go into too many details about how I made the simulation, because I don’t expect that many of you care; leave a comment or email me if you want to know more. But a quick and fairly technical summary: I first figured each team’s pythagorean record, which estimates a team’s performance going forward from its current run differential. Then I plugged those records into Bill James’s log5 formula to figure the odds that each team wins each game. I then used these odds to simulate the Tri-City, Vermont and Connecticut games for the rest of the season*, and played out the season 1,000,000 times. (This task is made a lot easier by the fact that the wild card will almost certainly not come out of the Stedler Division, so I only had to worry about three teams.)

*I included makeups for games that have been lost to rain this year – Tri-City vs Jamestown, Vermont vs Batavia and Staten Island – because they will be played if they affect the pennant race at the season’s end. (My mistake – these games will not be made up.)

Here were the results:

TRI wins:  30.3244
VER wins:  45.3815
CT wins:  8.3277
TRI + VER tie:  9.5976
TRI + CT tie:  2.2061
CT + VER tie:  2.9302
3-way tie:  1.2325

That comes out to a 16% chance that we’ll end up in some sort of tie. The same log5 process I used above can create odds that each team wins a head-to-head play-in game (there is no play-in game; the tiebreaker is divisional record), allowing us to estimate the full odds that each team makes the playoffs (for simplicity’s sake, I assumed that each team would win the three-way tie one-third of the time):

Tri-City: 37.38%
Vermont: 51.83%
Connecticut: 10.80%

I was surprised that Connecticut’s odds are so low. But if you look at run differential, the Tigers just haven’t been very good this season. They rank dead last in runs scored and have a worse run differential than all but two teams; their Pythagorean record pegs Connecticut as a .426 team, rather than a .500 one. The Tigers are 10-5 in one-run games, and will probably not be as lucky going forward.

The ValleyCats have a better run differential and expected record than Vermont, but the 1.5-game edge in the standings is enough for Vermont to remain the favorite. Still, I can assure you that their playoff odds are as high as they’ve been all season.

Two major caveats come with these results. The first is that my simulation does not currently discriminate between home and road games, treating them all equally. I will probably build in an adjustment for this in the next edition of my playoff odds. The second is less clear-cut. Right now, all of my predictions are based on full-season data, so games in June count just as much as games in August. I am not sure if this is optimal or not, particularly in a league where players get promoted relatively frequently; when I do this again, I’ll consider weighting recent results more heavily. It clearly makes a difference in this race – Vermont is playing terribly of late, while the ValleyCats are hot. If you think recent results are more predictive than early-season games, you should consider Tri-City somewhat more likely to make the playoffs than these numbers, and the opposite for Vermont. 

Kevin Whitaker

Stedler Division Race

The ValleyCats shut out Connecticut last night, 6-0, as Vermont fell 4-3 to Lowell. The ValleyCats are now just 1.5 games out of a playoff spot.

I’ll say that again: The ValleyCats are just 1.5 games out of a playoff spot.

Back in July, Vermont’s lead over Tri-City was flirting with double digits, and it seemed impossible that the ‘Cats would have an interesting home stretch – most (especially I) thought they would settle for avoiding the basement for the first time since 2006, thanks to Lowell. Well, it now looks like this year’s ‘Cats might fully copy that 2006 team, which won the Stedler Division and made the playoffs.

Vermont’s nosedive in the standings certainly helped. The Lake Monsters have won just eight of their last 27 games, despite playing nine games against last-place teams and only five against teams currently above .500. Vermont, which hasn’t won consecutive games in four weeks, will need to right the ship as soon as possible if it wants to maintain its season-long hold on the division lead.

But it’s not as if the ValleyCats have just stood around while other teams fell. Instead, they’ve been playing extremely well over the past three weeks. Since losing a 13-inning thriller at Connecticut in Cooperstown, the ‘Cats have gone 15-8, including sweeps of Vermont and four of six in their most recent homestand. Only Jamestown has a better record than the ‘Cats this month.

And it’s not as if this is a fluky streak. The ValleyCats are near the middle of the pack in the NYPL record-wise, but after their recent hot stretch, they have the fourth-best run differential in the NYPL. Run differential is a better indicator of true talent, and a better predictor of future performance, than record. So, though it may be hard to believe, the ValleyCats have played like a playoff team in 2010. They have had an average offense but have allowed only 239 runs, fourth-best in the league despite playing in a hitters’ park.

Photobucket

Based on their runs scored and allowed, we would expect the ValleyCats to have a .553 record this season. But they’re still a game below .500 and 1.5 out in the division race, thanks to some poor luck in one-run games: Tri-City is 5-10 in such contests, worst in the league. (The ‘Cats also are further below .500 in extra innings than anyone else at 2-6.)

So the ‘Cats still have some ground to make up in the division. Fortunately, neither of the teams they are chasing is playing very well. Vermont has been outscored by 16 runs this month, bringing its season run differential down near zero despite an amazing start, and certainly looks headed in the wrong direction. And Connecticut is on the opposite end of the spectrum from the Tri-City: it has a run differential of -34, better than only two other teams in the league, with a record bolstered by a 14-8 performance in one-run games. The main culprit for the Tigers has been a futile offense, which ranks last in runs scored.

A total of six head-to-head games remain within these teams: the ‘Cats have three left with Connecticut (two away), while the Tigers play three at Vermont. The Lake Monsters and Tri-City each have three remaining with Lowell, while Connecticut is done with the Spinners.

But for the most part, it looks like the McNamara Division will help settle this race. Vermont is least fortunate schedule-wise, with six games remaining against Brooklyn, but the others also have three games against the league’s top team. Tri-City has to play in Brooklyn, where the Cyclones have been much better this season (22-6 home, 16-14 road), but they also are the last three games of the season, so Brooklyn could rest some players and have less motivation, as it all but wrapped up a playoff spot a long time ago.

Of their other nine games against McNamara teams, the ValleyCats play six against Hudson Valley, which looks like the next-best in the division. But six of the nine (including three with Aberdeen) are at home. Connecticut is a little bit more fortunate, with six against Aberdeen and three home with Staten Island, while Vermont also plays three-game sets with Aberdeen and Staten Island but travels for both.

All things considered, the ValleyCats and Connecticut face a remarkably similar strength of schedule, while Vermont’s is noticeably more difficult (including nine straight on the road to finish the season). Given that the ValleyCats now look like the most talented team in the division, this should be a very interesting race. (See my playoff odds for more.)

Tri-City may also be picking up some help down the stretch: third-round draft pick Austin Wates signed on Monday and will join the ValleyCats tomorrow. College players can often struggle with the transition to pro ball – as those of us who saw Mike Kvasnicka’s first month in Troy know – but Wates has the potential to help this team. He was a terrific hitter in college and had one of the best bats in the entire draft, drawing raves from scouts and evaluators. His long-term position is an open question – second base seems most likely – but for the rest of this season he’ll probably be an outfielder, and he slots into left field nicely for the ValleyCats. (Update: see Evan’s profile of Wates.)

Kevin Whitaker

Fielding in the NYPL

Fielding is hard to evaluate.

It is nearly impossible to judge defense without some level of subjectivity. Even the simplest defensive statistic, fielding percentage, relies on a scorer’s decision regarding whether a play would have been made with reasonable effort. More advanced statistics such as UZR attempt to take subjectivity out of the equation by comparing each hit with similar balls, and seeing how many fielders made those plays – but (to my knowledge) it makes other assumptions that are not always true, such as assuming that fielders begin from the same position and assuming that the batted-ball classification data is necessarily accurate.

While analysts have made tremendous improvement over the past five years or so, measuring individual defense remains an inexact science. It is even more so at the lower levels, where we don’t have nearly the data that MLB teams and fans have access to. Fans looking for information on a player’s fielding are limited to their own observation and fielding percentage, which at best paints a very crude picture and basically ignores a player’s range and ability to make difficult plays.

Fortunately, measuring team fielding is easier. In its simplest form, what is fielding about? It is about making plays: turning batted balls into outs.

That’s overly simplistic, and doesn’t account for many variables – double plays, stolen bases, preventing extra-base hits, passed balls and runners taking extra bases, to name a handful. But I think most would agree that the most important job of the fielders as a unit is to turn a batted ball into an out. And that characteristic is pretty easy to measure.

Strikeouts and walks are exclusively pitching stats – the fielders have essentially no say in whether or not those occur. (You could argue that a catcher’s ability to frame pitches could occasionally make the difference in those stats, but that’s a very, very weak effect at best.) Same with home runs – plays like this one aside, the fielders can usually only turn and watch as the ball goes over the fence.

But the balls hit in play? The defense has plenty of control over those. In theory, a perfect defense with lightning-fast players could turn every ball in play into an out. The worst possible defense could also never record an out on fair balls, never moving and dropping balls right at them. Actual teams, obviously, lie far from these extremes. Good-fielding teams will turn more balls into outs than poor ones, and we can measure this.

The statistic I just described is Defensive Efficiency Rating (DER), and if you’re already familiar with it, you skimmed through the last few paragraphs because you knew all that already. The formula is:

DER = 1 – ((H + Reach on Error – HR) / (PA – BB – SO – HBP – HR))

One caveat: MiLB does not keep data for “reach on error”. The only statistic available is total errors, which includes botched pickoff throws, throwing errors from the outfield, throwing errors on the back end of double plays and other plays that solely advance runners and don’t put runners on base. I estimated the number of errors that put an opposing batter on base as two-thirds of total errors (ROE = 2/3 * E).

So, which teams are best at converting balls in play into outs?

Photobucket

Despite the league’s fourth-best fielding percentage, the ValleyCats best only two other teams in defensive efficiency. This is what I expected when I started this – opposing hitters post good batting averages off Tri-City pitchers despite striking out often. Oscar Figueroa has great range when he plays short but neither Healey nor Orloff are anything to write home about in that respect – both are good defensively, but more for their hands than their range. Kik&eacute has good range at second, especially to his left, but the ValleyCats have been breaking in a bunch of new players at third (Kvasnicka, Orloff, Figueroa), who don’t read balls as well off the bat yet. Adamson and Infante have great speed in the outfield, but the latter didn’t make good reads in center (a problem I’ve seen much less of since he moved back to left).

It’s important to keep these ratings in mind when evaluating pitchers – a ValleyCats hurler will see one fewer ball of 20 in play turned into an out than one on Williamsport.* The CrossCutters are 31-19 atop the Pinckney Division, in no small part because of arguably the league’s best defense. Williamsport comes to Troy tomorrow for a three-game series.

*Implied in that sentence was that getting outs in play is the full responsibility of the fielders. This is not a discussion I really want to fully get into here, but for those who might be unfamiliar with it: in the past decade of so, it has been generally accepted that major-league pitchers have little control over what happens to a ball in play. (An exception is that knuckleballers, submarine pitchers and other unconventional throwers usually allow fewer hits than standard pitchers.) Note that this has not been shown to be true or untrue at the minor-league level.

Kevin Whitaker

Midseason Report

Today marks the midway point in the NY-Penn League season. 38 of the 76 scheduled games are remaining, although some teams have a couple more due to weather postponments. Tri-City has played 36 games and stands at 15-21. The ValleyCats seem certain to finish out of the cellar for the first time since 2006 – they’re already eight games up on 8-30 Lowell – but the record is still a bit of a disappointment to a team that has seemed inconsistent.

The pitching was scary good early in the year, while the offense was scary in a completely different sense, threatening the Mendoza line with a June batting average of .192. But both sides have gone closer to league-average levels. At the midway point, the ‘Cats are batting .243 and rank eighth in the league with 170 runs scored. Their ERA is up to 4.08, and only four of the league’s 14 teams have allowed more than their 179 runs.

Quite a few ValleyCats have heated up in the past week or two. Mike Kvasnicka was batting just .152 and slugging .207 ten days ago, but has been on fire for the past week. In his last eight games, Kvasnicka is batting 15-for-36 (.417) with two homers, six extra-base hits and 11 RBI.

A couple of reserves have earned more playing time with recent hot streaks. Tonight’s DH Buck Afenir has gone 5-for-11 in the team’s last ten games to raise his season batting average to .314. Afenir’s biggest hit came at Cooperstown on Saturday, when his pinch-hit double in the ninth inning brought home Dan Adamson with the game-tying run. Shortstop Jacke Healey had only four hits on the season at the start of last week, but homered in back-to-back games against Brooklyn and Aberdeen, then had consecutive two-hit games at Vermont over the weekend.

Kik&eacute Hernandez has been unstoppable for the entire month of July. The second baseman hit just .152 in the first month of the season but has hit safely in 20 of 21 games this month, upping his season average to .295.

Here’s a look at where everybody in the NYPL stands thus far, sorted by run differential:

Photobucket

The last column represents the number of games Tri-City has remaining against each team. As you can see, the schedule was pretty front-loaded, and the ValleyCats will generally face easier opponents from here on out. That starts with a three-game home series against Lowell tonight – the Spinners come in having lost 13 of their last 14 contests. Only 14 of the ‘Cats’ 39 remaining games come against teams that currently have a positive run differential. (Note: this assumes they will not make up the rained-out game against Jamestown, which will only be played if it has playoff implications at the end of the season.)

The ValleyCats have unlucky this year – we would have expected them to win 17 games based on their run differential, when they are actually 15-21. And they’ve faced a tough schedule to this point, playing a lot of games against the league’s better teams. I don’t think it’s unreasonable to expect the ValleyCats to play .500 or even a bit better in the second half.

Their playoff chances, however, are still very remote. Brooklyn currently has the league’s best record, at 25-13. If the ValleyCats played like the league’s best team in the second half, they would finish at 41-35 or so. Five teams are currently on pace to have a better record than that, and another two aren’t far behind. So even if the ‘Cats play .650 ball from here on out – which only one team did in the first half – they would still probably have no better than a 50-50 shot at reaching the postseason.

But that doesn’t mean the season is lost. The ValleyCats seem very likely to post their best record since 2006, and may be able to reach .500 by the end of the season. For a team that seemed incapable of scoring a run one month ago, that wouldn’t be a bad ending.

Kevin Whitaker

ValleyCats: Unluckiest team in NYPL

The ‘Cats got a big win tonight, thrashing Auburn 11-2 to snap a three-game skid. Tyler Burnett had a monster night, going 3-for-3 with a homer, a double and two walks; his 19 bases on balls rank second in the New York-Penn League. Ben Heath added a homer and a double, while Mike Kvasnicka notched his first extra-base hit since Opening Day. Tri-City pounded out 15 hits and didn’t commit an error while turning three double plays. Carlos Quevedo posted his fifth consecutive quality start, allowing two runs in six innings for his second victory of the season.

The 11 runs marked a season high for the ValleyCats this season. They also set another milestone you may not have noticed: with the blowout victory, the ‘Cats have now scored more runs on the season (112) than they have allowed (111).

That’s right, Tri-City has outscored its opponents this year. You would not expect that from the standings, however: the ‘Cats stand at 10-15 (.400), ahead of only two other teams in the NYPL.

It has generally been accepted in baseball (and in most other sports) that run (or point) differential is a better indicator of a team’s true ability than winning percentage. This is because the binary of “win” vs “loss” tells us relatively little about how well a team played in a given game. Run differential helps us get a better picture – a team that wins by 9 runs generally had a better performance than a team that won by one run. Over a larger sample, wins and losses cumulatively give us a pretty good picture of a team’s talent, but run differential will usually tell a more complete story.

Run differential and winning percentage often agree, but there are times when they don’t, such as for the ValleyCats this season. Based on run differential, we would expect Tri-City to be about .500; instead, they’re .400. Generally, difference between expected and actual winning percentage is chalked up to the vague term “luck”.

There is one factor, unique to baseball, that often explains the difference between expected and actual winning percentage: bullpen performance. If a team’s bullpen is lousy, it might lose more than its share of close games, which would hurt its winning percentage more than its run differential. However, it is easy to see that this theory does not fit the ValleyCats this year. The Tri-City bullpen has been far from weak; it has been outstanding, with a 2.68 ERA. I can’t find any sortable stats for bullpens around the NYPL, but the league-average ERA is 3.92, and the best pitching staff (Vermont) is at 2.75. Even after allowing for the fact that relievers generally have a lower ERA than starters, the ‘Cats have still had an excellent bullpen.

So, if anything, we would expect the ValleyCats to be overperforming their run differential, instead of playing well below it. Without any other likely explanation, I have to conclude that the ‘Cats have simply suffered some bad luck, and they’re more likely to play like a .500 team than a .400 team going forward.

Looking at run differential, there are a few clear-cut tiers in the NYPL:

Photobucket

Jamestown, Brooklyn and Vermont sure seem to have separated themselves from the pack. Vermont has already all but clinched the Stedler Division, while the other two currently lead by small margins and should be expected to pull away. But the race for the fourth-best team is a real mess; according to run differential, 15-11 Williamsport is no better than 10-15 Tri-City. There are eight teams who are a good or bad game away from a zero run differential, which is awfully rare. Then the two Valley teams are a clear cut below, with Lowell unsurprisingly bringing up the rear.

Sabermetric pioneer Bill James devised a method of predicting winning percentage from run differential called the Pythagorean Expectation. The equation, which has held up well over the two or three decades since its inception, is fairly simple:

Expected WP% = RS-squared / (RS-squared + RA-squared)

A team who has allowed as many runs as it has scored would be expected to have a winning percentage of .500, and the results are similarly intuitive for other inputs (basically, the marginal value of each extra run has less and less effect on winning percentage).

So, compared to their Pythagorean expectation, which teams have been the luckiest and unluckiest? You won’t be surprised by the bottom team (positive = lucky; negative = unlucky).

Photobucket

According to run differential, Tri-City is basically as talented as any other team in the league, save the top three. The ValleyCats’ playoff hopes look awfully slim, despite this good news – their recent bad fortune has left them 4.5 games back and behind seven other teams in the wild-card race, which is a very difficult hurdle to overcome under any circumstances. But if their run-scoring and run-allowing rates stay roughly the same, the ValleyCats seem likely to win a bit more frequently than they have thus far.

Kevin Whitaker

Follow

Get every new post delivered to your Inbox.