Latest news with #OxyCon2025


Int'l Business Times
6 days ago
- Int'l Business Times
The Future of AI and Web Scraping: A Feedback Loop in Motion
The term "unicorn" is sometimes used to describe people like Zia Ahmad, whose professional versatility makes them a rare find. A back-end developer experienced in web scraping. A data scientist who is also an AI engineer, contributing to the improvement of Google's Gemini. Yet, Zia says that his true passion is teaching and has over 40 published courses to prove it. Zia teaches on a variety of topics, with a primary focus on data science, web scraping, and AI applications. This motley crew of topics will also come together in his presentation, titled "The AI-Scraper Loop: How Machine Learning Improves Web Scraping (and Vice Versa) ," which he will give at OxyCon 2025 , a web scraping conference organized by Oxylabs . After a very versatile career journey, you have landed at Turing. Please tell us more about your current work? One day, I got a call from Turing saying they wanted me to work on Google's project, and that was really exciting because I was already a user of this product called Gemini. Now they wanted me to work on it, which was a great opportunity. I'm responsible for the aspect of Gemini that deals with data analysis, data science, machine learning, and AI questions. When you feed Gemini an Excel or CSV file and ask it to generate insights, build a machine learning model, or calculate the average of a column, we actively monitor Gemini's responses, analyze what it's doing right or wrong, write reports, and submit them. We also suggest improvements to the back-end engineers based on our findings. You also teach many online courses. Do you have a signature or dream course? One dream course I still want to create is on the very topic I'll be speaking about at OxyCon—how we can merge artificial intelligence and web scraping. These are the two areas I've worked in professionally, and there's a sweet spot where these two fields can complement each other beautifully. And I've found that there are very few "bridge courses" that connect two technologies like this. The topic of web scraping and AI—especially web scraping for AI—is currently quite controversial. There's a lot of tension between content creators, publishers and AI companies that need data to train models. What's your view on this? My view is this: if the data is publicly available, there shouldn't be any question about scraping it. Whether it's an article, product info, or public medical data—if it's viewable manually, it should be scrapable. For example, consider an e-commerce company that lists product details publicly on its website. That information is already out there for anyone to see. Scraping it just makes the process faster and more efficient. If I didn't scrape it, I could hire 100 people to manually go through the site and collect the same information. That's also doable—so why not automate it? However, if the data is private—for example, stored in someone's Google Drive—then it absolutely shouldn't be scraped. In fact, even manual access to private data without permission shouldn't be allowed. That's the boundary I draw. Public data, which anyone can access, should be fair game for scraping. Private data, which is not openly available, requires explicit consent. Zia Ahmad Turning to your presentation at OxyCon, could you give us a sneak peek? What are you planning to share with the audience? AI and web scraping can form a loop. I'll explain how that loop could work, where data scraped from the web helps train AI models, and how those models, in turn, improve scraping. I'll cover both the benefits and the potential downsides of this feedback cycle. While not everything about it is practically feasible just yet, it's an exciting concept. I'll be discussing what's possible today, what might come in the future, what the blockers are, and how we might overcome them. I also run a small business focused on data labelling and annotation. So I'll talk about how data annotation fits into this loop. Why isn't it feasible now? Is it because artificial intelligence is not at that level at the moment? Yes. No matter how intelligent your AI agent is, it's still going to make mistakes. And if there's no one monitoring the process, you can end up in a vicious loop where those mistakes multiply at an exponential rate. That kind of growth in error is really dangerous. There needs to be a domain expert involved. For example, if the data is related to medical science, then a medical expert should validate both the input coming from web scraping into AI and the output going back from AI into the scraping system. When you think about the scale of that—having domain experts label all the data going in and out of AI models—it sounds massive. Of course. That's one of the biggest challenges in data labelling—you need a lot of people. There has to be validation at both ends—before data goes in and after it comes out. When I say "domain expert," it doesn't always mean someone with a PhD. For example, if we're working with traffic image data, a regular person can label that. In that context, they are the domain expert. However, in other cases—like annotating MRI scans or X-rays—we do need medical professionals. That's very expensive. The same applies to financial documents—we need experts to annotate those too, and again, it costs a lot. Wherever real people are involved, the cost goes up. You mentioned that your own business is related to solving this problem. Could you expand on that? Yes. I run a data annotation company called Prism Soft. We have contracts with AI companies, especially those working in computer vision. We receive large volumes of image-based or video-based data—sometimes even text or audio—and our job is to annotate it. That means we prepare the data so it's ready to be fed into AI models. Before we implemented our AI-assisted workflows, everything was done manually. For example, if there's an image with 20 cars, and the client wants bounding boxes drawn around each car, someone would have to draw all 20 boxes by hand. That takes time, and time means money. When you're working with millions of images, the costs skyrocket. That's exactly what we've worked to solve, and with considerable effort, we were able to automate roughly 60% of the process. I think that's the best we can achieve for now, given current technologies. However, we're working on increasing that percentage, and the pace of AI development is truly remarkable. If I predict something will happen in five years, it might actually happen in five months. You never know. There are so many factors involved—data, computational power, and engineering capabilities. The speed of AI development makes people uneasy. Some fear we might soon lose control over AI. What is your opinion on this? That's actually the most frequently asked question. Whenever I'm on a flight and someone asks what I do, the follow-up is usually, "Do you think the Terminator scenario will come true? Will AI destroy us?" If you interview 10 AI experts, you'll likely get 10 different opinions on this. But the world is already taking steps to control AI. One important development is something called XAI, or explainable AI. When new models come to market, they're trained not to respond to harmful or inappropriate questions. When we train a model like Gemini, the first question is always: "Is it safe?" Safety is a top priority. Large, reputable organizations put a lot of focus on this. Smaller players may not, but they often don't have the data or computational power to create models that could be dangerous at scale. That said—again, a potentially controversial point—we don't know what military or intelligence agencies in powerful countries are doing with AI. If we can build models like ChatGPT or Gemini, then obviously, AI can also be used in ways that raise serious ethical concerns. Another source of concern is the changing job market. People fear that their jobs will be taken from them. AI has already transformed many industries. Think about content creation. We used to have teams of ten content writers. Now, one person can do the job by asking AI to write the content and then simply proofread it. AI isn't taking jobs away; it's shifting the job market. And this has happened throughout history. New technologies create new types of jobs, and those who don't evolve with the changes risk being left behind. So yes, this is just history repeating itself—but at a much faster pace. I think evolution is the only thing that can keep us relevant. If I sit still and think my current skill set is enough, there's a 99.9% chance I'll be replaced. If not by AI, then by someone more up-to-date with new technologies. Even AI itself is constantly evolving. It's not the same as it was in 2015. There are tons of new models being developed, and AI is now one of the most active research areas globally. That means new technologies are being introduced daily. If you don't keep up, you will fall behind. Some say that AI is developing so fast now that even if it stopped evolving for a whole year, we'd still struggle to catch up with everything. Do you think we even have a chance? That's true. AI is evolving so quickly that even now, I find myself learning about tools and models that were released two years ago. The sheer volume of development is massive—there are only 24 hours in a day, and it's hard to keep up. I think the key is to stay focused on your area. Whatever your profession is—whether you're a content creator, developer, or analyst—just make a habit of exploring AI tools relevant to your field once a week. Some people believe, although I don't agree, that AI won't continue evolving at the same pace because it has already consumed most of the high-quality, human-generated data available. The argument goes: now that AI has absorbed the historical data, there's not as much new, meaningful information being produced, especially if most of the new data is AI-generated. Why do you think that's not the case? Because every day, we see a new AI model that's better than the one that came out yesterday. Especially in areas like agentic AI—models that can perform daily tasks—we're seeing rapid improvement. If AI can start handling routine tasks for employees, that's a significant leap. So if we've already fed AI the last 100 years of history, feeding it again doesn't help. Now we're just adding new data one day at a time. And while that's happening slowly, AI is still getting better. So what's still driving this rapid improvement? Engineering. Every day, researchers are coming up with new techniques for how to train AI using the same data in smarter ways. Think of it like a classroom: there's one teacher and 40 students, but each student learns differently. Some will score 95%, others 70%, and some maybe 30%. Everyone has a different learning technique. Same with AI—there's the same data, but with different architectures, training strategies, and optimization techniques, you can extract new value. Some approaches require powerful hardware, such as GPUs and TPUs. Others are more hardware-efficient. The techniques are constantly evolving, which is what drives AI forward. So the big question now is: how long can we keep coming up with better, more revolutionary models? And my answer is—there are six or seven billion people in the world, and that means there are six or seven billion possibilities.
Yahoo
11-08-2025
- Business
- Yahoo
OxyCon 2025 Announces Leading Speaker Line-Up to Converge on AI and Web Scraping
Oxylabs reveals the speakers joining OxyCon 2025, the go-to web scraping event that brings together data professionals worldwide. Experts from Turing, NielsenIQ and TEGOS Legal will come together with Oxylabs' own team for a free online day full of industry-leading conversations. This year's OxyCon theme is the role of AI in web intelligence, with talks on legal frameworks, real-life applications and how this affects a range of industries. VILNIUS, Lithuania, August 11, 2025 - Oxylabs, leading web intelligence platform and proxy provider, has revealed the 2025 agenda for the industry-renowned event OxyCon. This year's spotlight on AI-powered intelligence will feature experts from Turing, NielsenIQ and TEGOS Legal. These experts will come together to share insight on the role AI plays in web intelligence and provide actionable advice on how developers can start using this today. OxyCon annually brings together global industry leaders, technical experts, and data innovators. This year's event will explore how AI is transforming web scraping and vice versa - from smart automation and large-scale data extraction to legal insights and real-world case studies. Oxylabs CEO Julius Černiauskas said 'As AI continues to prove revolutionary across industries, access to data has become essential to power the boom.. Web scraping is an essential tool in the world of AI, and so it's more important than ever that we all come together and discuss how to access data in a fair and ethical way. I look forward to hearing from global experts on the role of web scraping in powering AI, now and in the future.' The announcement of 2025's agenda demonstrates the strength of those already committed to sharing thoughts. The sessions for this year's event include: Zia Ahmad, Data Scientist at Turing presenting the applications of Machine Learning (ML) for improving web scraping methods and the real-life applications of this he uses day-to-day in his role working on Google's Gemini. Fred de Villamil, CTO at NielsenIQ Digital Shelf giving a walkthrough of how e-commerce data can be scaled in a way that means organisations can perform data extraction on over 10 billion products per day. Denas Grybauskas, Chief Governance and Strategy Officer at Oxylabs and major voice within the Ethical Web Data Collection Initiative leading a panel alongside partners from TEGOS Legal and Farella Braun + Martel on the complex legal landscape of AI and web scraping. A hands-on session with Rytis Ulys, Head of Data & Analytics at Oxylabs, on building AI-powered price comparison tools with only Cursor and Oxylabs' own AI Studio. Zia Ahmad, Data Scientist at Turing shared how timely his session will be: 'AI and web scraping can form a loop' Ahmad says. 'I'll explain how that loop could work, where data scraped from the web helps train AI models, and how those models, in turn, improve scraping. I'll cover both the benefits and the potential downsides of this feedback cycle. 'While not everything about it is practically feasible just yet, it's an exciting concept. I'll be discussing what's possible today, what might come in the future, what the blockers are, and how we might overcome them.' To find out more and claim a free space at the free virtual event held virtually on October 1st, visit the OxyCon 2025 website. Everyone interested will also be able to join the OxyCon Discord community, where they can engage with participants, speakers, and industry pros before, during, and after the event.-ENDS- About OxylabsEstablished in 2015, Oxylabs is a web intelligence platform and premium proxy provider, enabling companies of all sizes to utilise the power of big data. Constant innovation, an extensive patent portfolio, and a focus on ethics have allowed Oxylabs to become a global leader in the web intelligence collection industry and forge close ties with dozens of Fortune Global 500 companies. Oxylabs was named Europe's fastest-growing web intelligence acquisition company in the Financial Times FT 1000 list for several consecutive years. For more information, please visit: Media ContactsVytautas +370 655 34419Email: press@ in to access your portfolio