Latest news with #evaluation


New York Times
5 days ago
- Sport
- New York Times
Would Zaccharie Risacher still go No. 1 in a 2024 NBA re-draft?
Every year after the NBA season ends, I think it's worth evaluating the previous year's draft class. So it's time to re-draft the 2024 NBA Draft. On some level, this is a silly exercise, if only because we're still working with incomplete information on these players' careers. We're essentially using one year of information about each of these players as pros and using it as a piece of the puzzle to evaluate what their future careers look like. We're still projecting an immense amount here, given that quite a few of these players just turned 20 years old. Advertisement However, that year of information is important to the evaluation process. This isn't a flip-flop exercise; it's simply taking what we know now and applying it to what we believe will happen. No one drafts a player for what they will do in their rookie year. So heading into Year 2, it's worth looking at who the most impressive players are in the class, and who I'm still betting on. This was the worst rookie class at the top end on a production basis since at least the 2016-17 season, when Malcolm Brogdon won Rookie of the Year. That also matches the sentiment on this class coming into the draft, as I ended up ranking zero players in either Tier 1 or Tier 2 of my draft grades, which I reserve for players who are good bets to make an All-Star game. Indeed, in this class, I still don't see any player as a great bet to reach that level. Undoubtedly, someone will. Even the disastrous 2000 NBA Draft class, without question the worst class of the last 25 years from a production standpoint, saw three players make a single All-Star game in Michael Redd, Kenyon Martin and Jamaal Magloire. Ultimately though, it's hard to bet on any player in this class to do so. I wouldn't take these rankings as more than a snapshot in time. There is a lot still to play out. While I don't have a significant number of players here whom I think are surefire starters, I believe several of last year's draftees can stick as solid rotational players. The class looks somewhat deep in that respect. The average number of players that typically stick from a single draft as rotational difference-makers tends to be around 20 to 25. Even though I am skeptical of the high-end talent on some of these players, I expect the 2024 NBA Draft to hit the higher end of that rotational player range. So without further ado, let's re-draft: Actual pick: No. 4 Advertisement Castle won the Rookie of the Year award and deserved it. It was a bit of a roller coaster early on for him, as he'd showcase terrific monthlong stretches followed by months when he was a bit less valuable. However, from right around the start of March onward, Castle was tremendous through the end of the year, averaging 19.3 points, 5.6 assists and 5.2 rebounds while shooting 46 percent from the field and drawing six free-throw attempts per game over his final 25 games. He handled the ball more consistently during that time and showed passing and playmaking chops while still bringing the defensive mindset that made him so valuable to begin with. At 6-feet-6, 215 pounds, Castle has a great frame and has already started to figure out how to weaponize it, using his pace and poise in ball screens at a tremendous level. Ultimately, the question for his upside is how he scores the ball consistently and efficiently. We know that he can pass and make quick decisions. We know that he's a physical defender who can guard up and down the lineup. But his jumper has to improve. He hit only 28.5 percent of his 3s last season and doesn't have much of a track record that you can point to with his shooting trajectory. If you made me bet right now on one player reaching any sort of All-Star ceiling from the class, it would be him. But with Castle surrounded by De'Aaron Fox and Dylan Harper in San Antonio, I question how much on-ball responsibility he will receive long term. Regardless, Castle proved at Connecticut that he can be a winning player in any role, as he has an innate understanding of space and how to use it. He gets himself into dangerous areas by thinking quickly and reacting faster to what's happening around him than anyone else. His basketball IQ is very high, and even if the jumper doesn't come along, I'd bet on him being the kind of non-shooter who impacts a playoff series. Actual pick: No. 1 I felt like his season went a bit under the radar until his final stretch of games, resulting in his second-place Rookie of the Year finish. Simply put, he is an incredibly smart off-ball wing and a threat to score when he's open. He drilled nearly 36 percent of his 3s and consistently made shots on the interior. Advertisement He has an innate sense of where the dangerous areas are on the court; he's constantly cutting or relocating beyond the 3-point line to threaten defenders. His timing on cuts to the rim is superb, and he always seems to be watching when his defender is going to help so that he can either sprint to the rim backdoor or lift to the wing. Defensively, he isn't particularly disruptive, but he's proven to be useful with his willingness to rotate. He maps the court well and knows where he's supposed to be, plus he has some positional versatility. Over his final 35 games, Risacher averaged 14.9 points and 3.6 rebounds and shot 51.8 percent from the field, 42.1 percent from 3, and 71.6 percent from the free-throw line. He looks like a terrific starter-quality wing who will stick around for quite a long time. Actual pick: No. 16 McCain only got to do it for about a month while the 76ers were dealing with injuries, but no player flashed more upside as an offensive weapon last season than him. From Nov. 8 until Dec. 4, he averaged 21.7 points, three rebounds and 3.2 assists while shooting 47 percent from the field, 40 percent from 3 and 85 percent from the line in 13 games. And these were games at the beginning of the season when teams were still highly competitive, as opposed to some of the late-season runs we saw after a large portion of teams were out of contention. He had 23 points against the New York Knicks, 34 points and 10 assists against the Cleveland Cavaliers, 29 on the road against the Orlando Magic, 20 in back-to-back road games against the Miami Heat and Memphis Grizzlies, and then 30 against Brooklyn in the middle of the Nets' most competitive run of the year. No rookie put together a better stretch of games than that the entire season. McCain's comfort handling the ball shined, as he was freed up following an injury to Tyrese Maxey to be the 76ers' creative force. We know he will likely develop into one of the NBA's truly elite shooters, with pristine mechanics and superb touch. However, his ability to create and attack for himself stood out and gave him what looks to be serious upside long term. Alas, McCain's last game came on Dec. 13, as he tore his meniscus. The fit with Maxey long term could be messy because they're both smaller, but McCain's upside looks to be quite high as an offensive player. Actual pick: No. 3 I'm fully aware of the hate I'm going to get for saying that I'd still take Sheppard ahead of some players below him after the rookie season he had. But if I'm hunting for upside, I'm still riding with Sheppard. I don't see much in terms of potential star power below this level. But I can at least see a world in which Sheppard maximizes his passing and shooting ability into becoming a serious difference-maker. Advertisement Sheppard was stuck behind teammates on a loaded Rockets team that finished second in the West. He only saw 654 minutes, and in those minutes was not particularly effective. He struggled on defense and was inefficient as a scorer as his 3-point shot didn't fall. Still, Sheppard remains a high-level ball-screen creator with awesome vision and playmaking ability mixed with elite shooting from distance. His pace and tempo were strong at summer league, and I thought he played much better than many did given that the team surrounding him was quite poor even by summer league standards. His performance in particular against the LA Clippers was high level, showcasing many of the traits we saw at Kentucky when he got a chance to run the show and at summer league in 2024. Houston clearly has faith in him, having opened up playing time this year for him while also only committing to Fred VanVleet for two more years on a contract that I think is quite tradable. Actual pick: No. 7 I would have voted Clingan first-team All-Rookie last year. He was tremendous on the defensive end, particularly in the back half of the year after he got a chance to play following Deandre Ayton's injury. He's not going to be the sexiest player statistically, but he has a serious chance to impact winning. In his final 28 games, Clingan averaged 8.8 points, 10 rebounds and 1.8 blocks while shooting 55.6 percent from the field. Beyond that, his positioning in drop coverage and rim protection remained elite. When Clingan was on the court last season, the Blazers gave up only 112.4 points per 100 possessions, per PBPStats. That was four points better than when he was off the court and was the equivalent of a top-10 defense in the NBA. In his 1,324 minutes last year, the Blazers were essentially a break-even team despite losing minutes without him by about five points per 100 possessions. Clingan gobbles rebounds and is incredibly sharp positionally. He held opponents to just a 49.5 percent mark around the rim on shots that he contested. He has a chance to be an All-Defense-caliber center, especially given the Blazers' decisions to focus on the defensive end this offseason. Advertisement Actual pick: No. 11 I'm not quite as high as some on his upside, as I'm skeptical of his half-court shot-creation ability because of his high hips and inability to gain leverage with his lack of strength. But I buy Buzelis as likely a solid starting caliber wing because of his size, athleticism in the open court and shooting ability. The full-season numbers look pedestrian, but in his last 35 games, he averaged 13.3 points, 4.5 rebounds, 1.7 assists, and shot 49.4 percent from the field, 37.3 percent from 3 and 81.7 percent from the line. He's not quite as sharp with his timing as a cutter as Risacher, but he moves well without the ball and his athleticism, when he can load up in space, allows him to be a threat opponents must track. His shot off the catch fell this year at a solid level, something we had seen from him in high school at Sunrise Christian but which abandoned him in his yearlong sojourn with the ill-fated G League Ignite program. Defensively, I'd like to see him be more disruptive and engage at a higher level consistently if his role is going to be more of an off-ball play finisher. But as long as the shot is there and it opens up his cutting, there's a big-time role for a player who is this big and athletic in open space. Actual pick: No. 5 Holland played every game except for one last year in Detroit and was a clear rotation guy thanks to his toughness on the defensive end and athleticism in the open floor on offense. He took on tough defensive assignments in his 15 minutes per game and created havoc with his ability to crash around on the glass and in loose-ball situations. Offensively, his game was simple, and he rarely got guarded because of his lack of shooting ability, something he has clearly worked on this offseason if his performance at summer league is any indication. Advertisement Maybe I'm over-indexing on a three-game summer league performance, but more than the results, the jumper mechanics look to have been overhauled positively. The ball looks very clean and fluid through his load-up and is purer coming out of his hands. If he can prove to be even an average shooter, he has a chance to be a serious upside play in a class lacking those kinds of bets. Actual pick: No. 9 Edey looks like the prototypical play-finishing big. He averaged nine points and eight rebounds in 22 minutes per game this season, hitting 58 percent from the field and consistently being a threat around the rim. More importantly, the team was dominant in its minutes when he was on the court with Ja Morant. Edey is a bulldozing screener who loves physicality, much like Steven Adams. So it should come as no surprise that in Morant's minutes with Edey, the Grizzlies outscored opponents by nearly 10 points per 100 possessions, posting an offensive rating over 121. And on defense, the pairing of Edey with Jaren Jackson Jr. saw the Grizzlies allow only 109.6 points per 100 possessions, a mark that would have been top three in the NBA. With the Grizzlies looking to incorporate even more ball-screen action this season with Morant, Edey's game is a perfect fit. However, he'll need to get healthy after undergoing surgery in June to stabilize his left ankle. He's expected to miss the start of the season and will be re-evaluated in October. Still, he looks like a long-term starting center in the NBA. Actual pick: No. 39 Wells was probably the rookie who was most impactful on winning basketball for the first three-quarters of the season, as his willingness to take on tough defensive assignments for the Grizzlies was incredibly valuable when mixed with his 3-point shooting. Advertisement At the very least, Wells can knock down shots off the catch, plus provide a strong outlet option in transition for the team's uptempo offensive look. He tailed off a bit late in the year but was rightfully a finalist for Rookie of the Year. So why does he come in at No. 9? I'm worried about his upside, even if he's already established himself as a borderline starting-quality player. He hasn't even showcased much off the bounce. The name that comes to mind as a point of comparison is Josh Richardson, a player drafted in the second round in 2015 who immediately emerged as a rotation player, was a terrific two-way role player for nearly a decade and made $62 million before going overseas this year. He never really elevated beyond that level, and I'm worried Wells might not either. Still, that's worth a lottery pick in this class, and it's a win for both the Grizzlies and Wells. Actual pick: No. 15 Ware is another play-finishing big man. I went with Edey over him because Edey is a better screener and defender, but Ware has more upside long term if he can focus on the details while playing for the Heat. Over his final 42 games last year, Ware averaged 11.1 points, 9.7 rebounds and 1.2 blocks while shooting 54.4 percent from the field and making a 3 every other game. Few bigs can sky out of ball screens higher and quicker than Ware can if he can slip and get downhill. He's a serious threat to throw down massive lob dunks at any point, and he moves like a wing despite being 7 feet with a 7-foot-5 wingspan. I tend to over-index on big men doing the non-negotiables at the center position, and right now Ware doesn't do them. That's why he came in at No. 10. I did not think he was a particularly good defender positionally last year. The Heat were excellent defensively when both Ware and Bam Adebayo were on the court, posting a 110.5 defensive rating, and his ability to act as a deterrent inside certainly opens up lineup constructions that allow the Heat to be great on defense. But I'd like to see him continue to grow on that end both in space and in his overall activity level. Advertisement Additionally, while part of the appeal of Ware is certainly that ability to leave a screen early and slip to the rim to beat a help defender to a lob, he needs to keep working on his overall screening ability. The good news is that he landed in the perfect spot, as the Heat are very detail-oriented and sharp developmentally. Actual pick: No. 2 I ranked Sarr No. 1 before last year's draft, but I would not have been quite as high on him if he'd been entering the draft this year because of some adjustments in how I look at center prospects. With Sarr, I got a bit caught up in the idea of how he can expand his game at the center position. Undeniably, he moves incredibly fluidly at his size and has potential to switch. He has potential to shoot it, too, and is comfortable firing from distance. His passing this past season was a revelation, especially out of short rolls. He's very comfortable with the ball, and I'm a huge proponent of every team needing a five-out look on the offensive end. Sarr can bring that. However, he does not do the non-negotiable parts of being an NBA center to a startling degree. He doesn't have great hands when it comes to grabbing contested rebounds and can be moved around when he tries to anchor his position. The same goes on defense, where he can struggle to deal with stronger players and get sealed off positionally. On offense, Sarr is one of the worst finishers to enter the league at the center position in a long while. He made just 45.1 percent of his layups this past season in half-court settings, per Synergy, which is among the lowest marks I've ever seen for a big. He only made 55.4 percent of shots at the rim, and that was the second worst in the league among centers who took at least 150 shots at the rim, in front of only Andre Drummond. He's also not a particularly good screener, either. Sarr's upside remains quite high if he can get stronger, can become a better screener and can somehow fix the catastrophic finishing issue around the rim. But without that, it might be quite difficult to get him into lineups that end possessions effectively on both ends of the court. Advertisement Actual pick: No. 38 Mitchell was a tremendous second-round pick, earning a legitimate role on the title-winning Thunder before a turf toe injury held him out for three months. He was the team's backup point guard and was very useful, running the offense with confidence and poise while efficiently knocking down shots and finishing at the rim. Mitchell consistently plays on-balance and off two feet and is a very capable orchestrator when surrounded by talented players. Beyond that, his finishing craft around the rim is extremely high-level for his size. To become more than a backup, Mitchell will need to be a better passer and playmaker. But there are some outcomes where I can see him turning into a starter if he continues to grow and mature his game, given his confidence level out of ball screens. Actual pick: No. 24 George and Wells are probably the biggest surprises of the 2024 draft cycle for me and certainly represent my two biggest misses. At Miami, I did not think George was particularly athletic or impressive on the defensive end. He just did not give the kind of effort or display the awareness that made me think he would turn into a positive defender, even at his size. And yet, in his rookie season, he was a positive defender on a particularly bad defensive team. He was consistently active and engaged and made the opposition's life harder. Offensively, George doesn't have a ton of vertical explosion to finish at the rim and has been hot and cold as a shooter. I think he'll likely knock down shots off the catch at some point, and he's an awesome, quick decision-maker as a passer off closeouts, but the role might end up being a bit more limited to that. Still, he looks like a win for the Wizards' front office. Advertisement Actual pick: No. 12 I kept Topić right around the area I had him ranked and where he was drafted after he missed all of last season recovering from an ACL tear he suffered late in his pre-draft season. The 6-foot-6 lead guard is a terrific ball-screen playmaker and passer with serious size. He reads the defense on the second and third levels incredibly well and is always a threat to make a cross-corner or cross-wing kickout on his drives. He lived in the lane in the Adriatic League and should be able to find paint touches consistently. Ultimately, his upside will be determined by whether he can consistently score from distance in the NBA. His jumper has been up and down off pull-ups. But in Oklahoma City, he should be able to recover slowly and keep working on his game in a strong developmental ecosystem. Actual pick: No. 32 I had Filipowski as a top-20 player in the class, and that's about where I think he sits now. He had a tremendous rookie season, and he's my favorite of the team's three draft picks within the top 35 from last year. I thought he should have won All-Rookie honors, as he averaged nearly 10 points and six rebounds per game on an efficient 50 percent from the field and 35 percent from 3 as a stretch, playmaking big. He particularly thrived in the final third of the season, averaging 14.5 points and eight rebounds on 50.6 percent from the field and 37 percent from 3 over his final 29 games. He's willing to play physically and is tougher than he gets credit for on the interior, plus he can handle the ball and pass a bit more than people think. It will all come down to what kind of level he can reach on defense. If he proves able to defend at the four or five, he has a chance to be a starter. If not, he'll likely settle in as this generation's Kelly Olynyk, a high-level third big who helps some really good teams. Advertisement Actual pick: No. 14 Carrington played more minutes than any other player in the draft class and was a bit of a mixed bag. It's worth noting he was one of the youngest players in the class, though, being asked to play one of the toughest positions as a teenager at point guard. I didn't think he was quite ready for such a large role. He fights on defense but is skinny. He is a silky ballhandler but can't really create any rim pressure because of his lack of strength and explosiveness. However, his natural feel for the game out of ball screens and the ability to get into his pull-up midrange jumper were quite valuable, and I thought he passed the ball quite well while limiting turnovers and sharing the lead guard role with Jordan Poole. He just turned 20, so it might take time for him to grow into his frame after a late growth spurt in high school. But he has natural gifts that could play up once he gets stronger on the ball. At the very least, he looks like a terrific sixth man type who could turn into more than that. Actual pick: No. 27 Shannon's athleticism explodes every time he hits the court. Even though he's already 25, it's easy to envision him as a player who has upside remaining because of how athletic he is. There were moments in the playoffs when Shannon looked like a serious impact player for an excellent Wolves team, even if a large portion of those minutes came in blowout situations. He didn't play often, but when he did, he looked like he belonged even while playing for a team that made the conference finals. As long as he keeps shooting and doesn't make too many bad decisions, he's an interesting replacement for what the Wolves lost with Nickeil Alexander-Walker. He should earn a backup role this year for a team with championship aspirations. Actual pick: No. 18 Advertisement When the Magic drafted Da Silva, they were hoping for a big wing who could play smart basketball, eat up rotational minutes, play his role defensively and knock down shots. Outside of the shooting, which only clocked in at 33.5 percent from 3 as a rookie, that's essentially what they got. He played 74 games, started about half of those because of the Magic's injuries and was useful as a cutter and off-ball wing. I don't see a ton of upside with Da Silva, but as long as the shot keeps falling and he gets stronger, he should be a good rotational wing for a while. Actual pick: No. 29 This is around where I had Collier ranked entering draft night for most of the reasons that he displayed as a rookie. He led all rookies in assists, averaging 6.3 per game. He's a bowling ball with explosiveness as a driver and out of ball screens and is excellent keeping his eyes up and maintaining his vision outward as he forays into the paint. He's a sharp, on-point passer who lives in the lane. His playmaking for others is not in question. The question, though, is how he scores. He posted a true-shooting percentage about 14 percent below league-average and has never been that impressive as a shooter from distance. He made only 24.9 percent from 3 last year and doesn't have much of an in-between game. Until he figures that part out, and I think he's starting at a major deficit there skill-wise, he profiles best as a backup. The good news is that he's already shown enough to stick in that capacity. Actual pick: No. 28 Dunn was an impact defender for the Suns last year, which is exactly the role expected of one of the premier defenders in college basketball coming out of Virginia. Whether he can become more than that will depend on if he can knock down shots. The good news is that the shot showed massive improvement this year, as he was confident enough to fire from distance 3.6 times per game. The bad news is that he only made 31.1 percent of them and shot under 50 percent from the free-throw line. The jumper looked better at summer league, and I thought he looked more confident and active as a cutter there, too. He projects as a solid rotation player with size and length who could be an impact bench player. (Top photo of Stephon Castle shooting over Kyshawn George: Greg Fiume / Getty Images) Spot the pattern. Connect the terms Find the hidden link between sports terms Play today's puzzle


Geeky Gadgets
31-07-2025
- Business
- Geeky Gadgets
Introducing Align Evals : The Ultimate Tool for AI Precision and Efficiency
What if evaluating the performance of large language models (LLMs) could be as precise and seamless as setting a GPS to your destination? With the rapid rise of LLM applications in everything from creative writing to technical problem-solving, making sure these models meet user expectations has become a critical challenge. Yet, traditional evaluation methods often feel like navigating uncharted terrain—time-consuming, inconsistent, and prone to misalignment between machine outputs and human judgment. Enter Align Evals, a new feature introduced by Langsmith, designed to bring clarity and structure to the evaluation process. By aligning machine-generated assessments with human-labeled benchmarks, Align Evals promises not only greater accuracy but also a streamlined workflow that enables users to refine their applications with confidence. LangChain explain how Align Evals transforms the way developers and researchers evaluate LLM-generated outputs. From its ability to detect and resolve misalignments to its iterative prompt refinement tools, Align Evals offers a comprehensive framework for achieving consistency and reliability in LLM applications. Whether you're perfecting recipe titles or tackling complex technical content, Align Evals adapts to your unique scoring criteria, making sure your outputs align with human expectations. By the end, you'll discover how this tool not only saves time but also enhances the quality of your applications, bridging the gap between innovation and precision. The question is: how will you harness its potential? Streamlining LLM Evaluations The Purpose and Role of Align Evals Align Evals is built to make the evaluation of LLM outputs both accessible and precise. Its primary objective is to determine whether machine-generated content meets specific scoring criteria by comparing it to human-labeled benchmarks. This alignment process minimizes discrepancies, ensures evaluations reflect human judgment, and ultimately enhances the overall quality of LLM outputs. By bridging the gap between human expectations and machine-generated results, Align Evals enables users to create more reliable and consistent applications. How the Align Evals Workflow Operates The workflow of Align Evals is designed to simplify the evaluation process while maintaining flexibility and adaptability. It follows a structured, step-by-step approach that includes: Gathering representative sample runs: Collect outputs from your LLM application that represent the range of its performance. Collect outputs from your LLM application that represent the range of its performance. Labeling samples with human expertise: Use human input to create a reliable benchmark for evaluation. Use human input to create a reliable benchmark for evaluation. Iterative refinement of prompts: Continuously adjust and refine prompts to ensure the LLM's evaluations align with human-labeled data. This iterative process ensures that the evaluation remains dynamic, allowing you to adapt as your application evolves. By following this workflow, you can identify and address inconsistencies, making sure that your LLM application meets the desired standards. How Align Evals Improves Large Language Model Performance Watch this video on YouTube. Here are more detailed guides and articles that you may find helpful on LLM evaluation. Handling Evaluations and Scoring Criteria Align Evals enables you to use the LLM itself as a judge to score outputs against predefined criteria. For example, if you are evaluating recipe titles, you might establish a rule to avoid unnecessary adjectives or overly complex phrasing. By iteratively refining prompts and evaluators, Align Evals ensures the scoring process aligns with your specific standards. This approach not only enhances the accuracy of evaluations but also helps identify and resolve misalignments effectively. The tool's ability to adapt to different scoring criteria makes it suitable for a wide range of applications. Whether you are evaluating creative content, technical outputs, or user-facing text, Align Evals provides the flexibility needed to meet your unique requirements. Key Features of Align Evals Align Evals is equipped with a comprehensive set of tools designed to support and streamline the evaluation process. These features include: Evaluator creation and modification: Build, test, and refine evaluators to assess LLM outputs effectively. Build, test, and refine evaluators to assess LLM outputs effectively. Iterative prompt refinement: Continuously improve prompts to align machine evaluations with human-labeled benchmarks. Continuously improve prompts to align machine evaluations with human-labeled benchmarks. Misalignment detection and resolution: Identify discrepancies between machine and human evaluations and address them systematically. Identify discrepancies between machine and human evaluations and address them systematically. Progress tracking tools: Monitor alignment improvements over time to ensure consistent evaluation quality. These features work together to provide a robust framework for evaluating LLM applications. By using these tools, users can achieve greater consistency, accuracy, and efficiency in their evaluation processes. A Practical Example: Evaluating Recipe Titles To illustrate the functionality of Align Evals, consider a scenario where you are tasked with evaluating recipe titles. Your goal might be to ensure that the titles are concise, clear, and free from unnecessary adjectives. Using Align Evals, you can follow these steps: Define the evaluation criteria: Establish clear rules, such as avoiding overly descriptive language or making sure brevity. Establish clear rules, such as avoiding overly descriptive language or making sure brevity. Label sample titles with human input: Create a benchmark by labeling a set of sample titles according to the defined criteria. Create a benchmark by labeling a set of sample titles according to the defined criteria. Refine the LLM's evaluation prompts: Adjust prompts iteratively until the LLM's scoring aligns with your expectations. This process not only saves time but also ensures that the evaluation outcomes are consistent and aligned with your goals. By automating parts of the evaluation while maintaining human oversight, Align Evals strikes a balance between efficiency and accuracy. Inspiration and Availability Align Evals draws inspiration from Eugene Yan's research on 'Align Eval,' which emphasizes the importance of aligning LLM evaluations with human preferences. Now widely available, Align Evals offers a user-friendly interface and a suite of powerful tools to enhance the evaluation process. Its design prioritizes accessibility and precision, making it an invaluable resource for developers and researchers working with LLM applications. By incorporating insights from research and practical use cases, Align Evals provides a reliable and adaptable solution for evaluating machine-generated outputs. Its availability ensures that users across various industries can benefit from its capabilities, improving the quality and reliability of their LLM applications. Enhancing LLM Applications with Align Evals Align Evals represents a significant advancement in the evaluation of LLM-generated outputs. By aligning machine evaluations with human-labeled data, it ensures greater accuracy, reliability, and consistency. Whether you are refining prompts, addressing misalignments, or defining specific scoring criteria, Align Evals offers a structured and efficient solution to meet your needs. With its robust features and intuitive design, this tool enables users to align LLM-generated content with human preferences, streamlining the evaluation process and enhancing the quality of applications. Media Credit: LangChain Filed Under: AI, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.


Washington Post
26-06-2025
- Health
- Washington Post
How to manage ADHD at work and turn it into a strength
NEW YORK — Jeremy Didier had taken her son to a psychologist for a possible ADHD evaluation when she spotted an article about women with the condition. As she read it in the waiting room, she thought to herself: They're describing me. 'Lots of risk-taking, lots of very impulsive behavior growing up,' Didier said. As the magazine described, she'd excelled in school but gotten in trouble for talking too much. She'd amassed too many speeding tickets as an adult. She turned to her husband and said, 'I think I might have ADHD.'


Forbes
25-06-2025
- Business
- Forbes
Why AI Benchmarking Needs A Rethink
Siobhan Hanna, SVP and General Manager, Welo Data. AI models are evolving at breakneck speed, but the methods for measuring their performance remain stagnant and the real-world consequences are significant. AI models that haven't been thoroughly tested could result in inaccurate conclusions, missed opportunities and costly errors. As AI adoption accelerates, it's becoming clear that current testing frameworks fall short in assessing real-world reasoning capabilities—pointing to an urgent need for improved evaluation standards. The Limitations Of Current AI Benchmarks Traditional AI benchmarks are structured to evaluate basic tasks, such as factual recall and fluency, which are easy to measure. However, advanced capabilities like causal reasoning—the ability to identify cause-and-effect relationships—are more difficult to assess systematically despite their importance in everyday AI applications. While most benchmarks are useful in gauging an AI model's capacity to process and reproduce information, they fail to assess whether the model is truly 'reasoning' or merely recognizing patterns from its training data. Understanding this distinction is crucial because, as S&P Global research notes, AI's reasoning ability directly impacts its applicability in tasks like problem-solving, decision making and generating insights that go beyond simple data retrieval. Additionally, the prompts used to evaluate the majority of AI capabilities are primarily in English, neglecting the diverse linguistic and cultural contexts of the global marketplace. This limitation is especially relevant as AI models are increasingly deployed around the world, where the demands for accuracy and consistency are vital across languages, as recently discussed by Stanford Assistant Professor Sanmi Koyejo. The Multilingual Blind Spot Most datasets used for evaluating causal reasoning are designed with English as the primary language, leaving models' abilities to reason about cause-and-effect relationships in other languages largely untested. Languages exhibit significant diversity in their grammatical structures, morphological systems and other linguistic features. If the models have not been sufficiently exposed to these differences, their ability to identify causality can be impacted. My company, Welo Data, conducted an independent benchmarking study across over 20 large language models (LLMs) from 10 different developers, revealing just how significant this issue is. The evaluation used story-based prompts that required contextual reasoning to test advanced causal reasoning capabilities across languages, including English, Spanish, Japanese, Korean, Turkish and Arabic. The results: LLMs often struggled with these complex causal inference tasks, especially when tested in languages other than English Many models showed inconsistent results when interpreting the same logical scenario in different languages. This inconsistency suggests that these models fail to account for linguistic differences in the way humans reason and convey causality. If AI is to be useful across diverse languages, benchmarking frameworks must evolve to test language model proficiency across linguistic boundaries. The Causal Reasoning Gap Causal reasoning is a crucial aspect of human intelligence, allowing us to understand what happens and why it happens. The study found that many AI models still struggle with this fundamental capability, particularly in multilingual contexts. While these models excel at pattern recognition, they often fail to effectively identify causal relationships in scenarios that require multistep reasoning. This gap is a significant limitation when deploying these models in real-world scenarios, such as healthcare, finance or customer support, where accurate and nuanced decision making is critical. Existing benchmarks often simplify cause-and-effect scenarios to tasks that rely on well-established datasets or pre-defined solutions, making it difficult to determine whether the model is truly reasoning or simply reproducing learned patterns. One promising direction involves using more complex, human-crafted testing scenarios designed to require genuine causal inference rather than pattern recognition. By incorporating such methods into evaluation frameworks, organizations can more clearly identify where models fall short—especially in multilingual or high-stakes applications—and take targeted steps to improve performance. A New Way Forward: Evolving AI Testing For The Future To truly understand how AI models will perform in functional settings, testing methodologies must assess the full range of cognitive abilities required in human-like reasoning. There are several ways companies can adopt better AI testing methodologies: • Implement a multilingual approach. AI models must be evaluated across multiple languages to ensure they can handle the complexities of global communication. This is especially important for companies operating in diverse markets or serving international customers. • Incorporate complex, real-world scenarios. Focus on evaluating AI through scenarios where multiple factors and variables interact, allowing for an accurate measurement of AI's capabilities. • Emphasize causal reasoning with novel data. Prioritize assessing causal reasoning abilities using previously unseen scenarios and examples that require genuine understanding of cause-and-effect relationships. This ensures the AI is demonstrating true causal inference rather than pattern matching or recalling information from its training data. Paving The Way: Building Better AI Existing benchmarks often do not accurately assess the full range of AI's capabilities, which can leave businesses with incomplete or misleading information about how their AI models perform, depending on which benchmarks are used and their specific objectives. As AI continues to evolve, so too must the methods used to evaluate its performance. By adopting a comprehensive, multilingual and real-world testing approach, we can ensure that AI models are not only capable but also reliable and equitable across diverse languages and contexts. It's time to rethink AI benchmarking—and, with that, the future of AI itself. Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?


Health Line
06-06-2025
- Health
- Health Line
Seeing a Psychiatrist for the First Time
At first, I was scared and overwhelmed by the thought of seeing a psychiatrist. The first psychiatric evaluation is, in my experience, the most intense and lengthy of your entrance into the world of seeing a psychiatrist. I've been going to the psychiatrist for over 5 years. Read on for more about my experience. Initial appointment My evaluation was in person with my doctor (in early 2020, pre-pandemic), and my partner also came along as a support person to provide his insight and answer questions as needed. My psychiatrist manages a mental health program at a hospital, and that's how I got connected with him for my first appointment. There are many ways to find a psychiatrist. Many psychiatrists do not take insurance, but some do. Your insurance may also reimburse you, if that's part of your plan. If you pay out of pocket, it can be expensive: I've been quoted anywhere from $400+ for the initial intake. Follow-up appointments are usually cheaper, but still cost at least $100 each. Some specialty mental health programs receive grants that may make the appointments more affordable. Ahead of the appointment, you might be asked to complete a questionnaire to help your psychiatrist evaluate your symptoms. The questionnaires are usually about anxiety, depression, and more. You'll also provide your family medical history, or what you know of it. Completing the questionnaires may feel overwhelming, so it's a good idea take breaks as you work your way through them, if you are able. It may also feel scary. For me, I worried that the psychiatrist was going to discover a new diagnosis that I had not already been given. That's probably a typical trauma response: 'What is wrong with me? Am I broken? Will I always feel like this?' were all thoughts that I had when initially accepting that I needed a psychiatrist for my mental healthcare. My first appointment was close to an hour long and involved a long conversation with my doctor. I felt mentally drained and exhausted afterward, so I rested for the afternoon. A few years later, I can say that having a psychiatrist whom I trust has made all the difference in my ability to care for myself and my mental health. Follow-up appointments Follow-ups tend to be shorter, at about 30 minutes or less, depending on how you're feeling. My psychiatrist and I chat about my mood and the potential side effects of the medications. We discuss how things are going in my life, including work, socialization, exercise, sleep, hobbies, energy levels, and more. It's not a therapy appointment, but more like a check-in about overall wellness. Then, we discuss any potential medication modifications, and my psychiatrist answers questions that I may have. Appointment frequency with your psychiatrist will vary based on your care. Follow-ups with my psychiatrist are more routine now; I go four times a year. It's important to meet with them periodically, as needed, depending on how you're doing. If I've made a medication change, whether switching to a different medication or upping or lowering a dose, I'll meet with them more frequently to check in. I now meet with my psychiatrist via video, using an app the hospital system utilizes for secure connection.