Latest news with #Ollama

OpenAI GPT-OSS Models Optimized for NVIDIA RTX GPUs

Geeky Gadgets

4 days ago

Geeky Gadgets

OpenAI GPT-OSS Models Optimized for NVIDIA RTX GPUs

NVIDIA and OpenAI have collaborated to release the gpt-oss family of open-source AI models, optimized for NVIDIA RTX GPUs. These models, gpt-oss-20b and gpt-oss-120b, bring advanced AI capabilities to consumer PCs and workstations, enabling faster and more efficient on-device AI performance. OpenAI, has unveiled its gpt-oss family of open-weight AI models, specifically optimized for NVIDIA RTX GPUs. These models—gpt-oss-20b and gpt-oss-120b—are designed to deliver advanced AI capabilities to both consumer-grade PCs and professional workstations. By using NVIDIA's innovative GPU technology, the models provide faster on-device performance, enhanced efficiency, and greater accessibility for developers and AI enthusiasts. The latest OpenAI models feature cutting-edge architecture, extended context lengths, and support for various AI applications, making them accessible to developers and enthusiasts through tools like Ollama, and Microsoft AI Foundry Local. Key Highlights of GPT-OSS Models Two Models, Tailored for Performance The easiest way to test these models on RTX AI PCs, on GPUs with at least 24GB of VRAM, is using the new Ollama app. Ollama is fully optimized for RTX, making it ideal for consumers looking to experience the power of personal AI on their PC or workstation. The gpt-oss family consists of two distinct models, each tailored to meet specific hardware requirements and performance needs: gpt-oss-20b: Designed for consumer-grade NVIDIA RTX GPUs with at least 16GB of VRAM, such as the RTX 5090. This model achieves processing speeds of up to 250 tokens per second, making it suitable for individual developers and small-scale projects. Designed for consumer-grade NVIDIA RTX GPUs with at least 16GB of VRAM, such as the RTX 5090. This model achieves processing speeds of up to 250 tokens per second, making it suitable for individual developers and small-scale projects. gpt-oss-120b: Optimized for professional-grade RTX PRO GPUs, this model caters to enterprise and research environments requiring higher computational power and scalability. Both models support extended context lengths of up to 131,072 tokens, allowing them to handle complex reasoning tasks and process large-scale documents. This capability is particularly advantageous for applications such as legal document analysis, academic research, and other tasks requiring long-form comprehension and detailed analysis. Technological Innovations Driving Efficiency The gpt-oss models incorporate several technological advancements that enhance their performance and functionality. These innovations include: MXFP4 Precision: The gpt-oss models are the first to support this precision format on NVIDIA RTX GPUs. MXFP4 improves computational efficiency while maintaining output accuracy, reducing resource consumption without compromising performance. The gpt-oss models are the first to support this precision format on NVIDIA RTX GPUs. MXFP4 improves computational efficiency while maintaining output accuracy, reducing resource consumption without compromising performance. Mixture-of-Experts (MoE) Architecture: This architecture activates only the necessary components of the model for specific tasks, minimizing computational overhead while maintaining high performance. This design ensures efficient resource utilization, particularly for complex or specialized tasks. This architecture activates only the necessary components of the model for specific tasks, minimizing computational overhead while maintaining high performance. This design ensures efficient resource utilization, particularly for complex or specialized tasks. Chain-of-Thought Reasoning: This feature enables the models to perform step-by-step logical analysis, improving their ability to follow instructions and solve intricate problems. It enhances their effectiveness in real-world applications, such as troubleshooting, decision-making, and problem-solving. These innovations collectively contribute to the models' ability to deliver high-speed, accurate results across a variety of use cases, making them versatile tools for developers and organizations alike. Versatile Applications and Use Cases The gpt-oss models are designed to support a wide range of applications and industries, making them highly adaptable tools for diverse needs. Key use cases include: Web Search and Information Retrieval: The models can process and summarize vast amounts of information, making them ideal for search engines and knowledge management systems. The models can process and summarize vast amounts of information, making them ideal for search engines and knowledge management systems. Coding Assistance: Developers can use the models for code generation, debugging, and optimization, streamlining software development workflows. Developers can use the models for code generation, debugging, and optimization, streamlining software development workflows. Document Comprehension: With their extended context lengths, the models excel at analyzing lengthy documents, such as legal contracts, research papers, and technical manuals. With their extended context lengths, the models excel at analyzing lengthy documents, such as legal contracts, research papers, and technical manuals. Multimodal Input Processing: The ability to handle both text and image inputs broadens their applicability, allowing tasks like image captioning, data analysis, and content generation. The customizable context lengths allow users to tailor the models to specific requirements, whether summarizing extensive documents or generating detailed responses to complex queries. This adaptability makes the gpt-oss models suitable for both general-purpose use and specialized applications, from enterprise workflows to individual projects. Developer Tools for Seamless Integration To assist adoption and integration, OpenAI and NVIDIA have provided a comprehensive suite of developer tools. These resources simplify the deployment and testing of the gpt-oss models, making sure accessibility for developers of varying expertise levels. Key tools include: Ollama App: An intuitive interface for running and testing the models on NVIDIA RTX GPUs, allowing quick experimentation and deployment. An intuitive interface for running and testing the models on NVIDIA RTX GPUs, allowing quick experimentation and deployment. Framework: An open-source framework that supports collaboration and optimization, allowing developers to fine-tune the models for specific hardware configurations. An open-source framework that supports collaboration and optimization, allowing developers to fine-tune the models for specific hardware configurations. Microsoft AI Foundry Local: A set of command-line tools and software development kits (SDKs) designed for Windows developers, allowing seamless integration into existing workflows. These tools empower developers to experiment with advanced AI solutions without requiring extensive expertise in AI infrastructure, fostering innovation and accessibility. NVIDIA's Role in Advancing AI The gpt-oss models were trained on NVIDIA H100 GPUs, using NVIDIA's state-of-the-art AI training infrastructure. Once trained, the models are optimized for inference on NVIDIA RTX GPUs, showcasing NVIDIA's leadership in end-to-end AI technology. This approach ensures high-performance AI capabilities on both cloud-based and local devices, making advanced AI more accessible to a broader audience. Additionally, the models use CUDA Graphs, a feature that minimizes computational overhead and enhances performance. This optimization is particularly valuable for real-time applications, where speed and efficiency are critical. Open-Source Collaboration and Community Impact The gpt-oss models are open-weight, allowing developers to customize and extend their capabilities. This openness encourages innovation and collaboration within the AI community, allowing the development of tailored solutions for specific use cases. NVIDIA has also contributed to open-source frameworks such as GGML and further enhancing the accessibility and performance of the gpt-oss models. These frameworks provide developers with the tools needed to optimize AI models for a variety of hardware configurations, from consumer-grade PCs to enterprise-level systems. Empowering the Future of AI Development The release of the gpt-oss models highlights a pivotal moment in the evolution of AI technology. By harnessing the power of NVIDIA RTX GPUs, these models deliver exceptional performance, flexibility, and accessibility. Their open-source nature, combined with robust developer tools, positions them as valuable assets for driving innovation across a wide range of applications. Whether for individual developers or large organizations, the gpt-oss models offer a practical and efficient solution for advancing AI-driven projects. Browse through more resources below from our in-depth content covering more areas on AI models. Filed Under: AI, Technology News, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

Shadow AI surge heightens enterprise security risks, study finds

Techday NZ

5 days ago

Business
Techday NZ

Shadow AI surge heightens enterprise security risks, study finds

Netskope research has highlighted a significant increase in generative AI (genAI) platform and AI agent usage in workplaces, amplifying data security concerns, especially through unsanctioned or "shadow AI" applications. The findings, detailed in the latest Netskope Threat Labs Cloud and Threat Report, show a 50% rise in genAI platform users among enterprise employees over the three months to May 2025. This increase comes as enterprises broadly enable sanctioned SaaS genAI apps and agentic AI but face growing security challenges as shadow AI persists. Growth of shadow AI The report indicates that while organisations continue efforts to safely adopt genAI across SaaS and on-premises environments, over half of all AI application adoption is now estimated to fall under the shadow AI category. These applications are not officially sanctioned by IT departments, raising concerns about uncontrolled access to sensitive data and potential compliance issues. GenAI platforms, which provide foundational infrastructure for organisations to develop bespoke AI applications and agents, are cited as the fastest-growing segment of shadow AI. In just three months, uptake among end-users rose by 50%, and network traffic linked to these platforms grew by 73%. In May, 41% of surveyed organisations were using at least one genAI platform, with Microsoft Azure OpenAI, Amazon Bedrock, and Google Vertex AI being the most commonly adopted. "The rapid growth of shadow AI places the onus on organisations to identify who is creating new AI apps and AI agents using genAI platforms and where they are building and deploying them," said Ray Canzanese, Director of Netskope Threat Labs. "Security teams don't want to hamper employee end users' innovation aspirations, but AI usage is only going to increase. To safeguard this innovation, organisations need to overhaul their AI app controls and evolve their DLP policies to incorporate real-time user coaching elements." On-premises AI and agentic use Organisations are increasingly exploring on-premises AI solutions, from deploying genAI through local GPU resources to integrating on-premises tools with SaaS applications. The report finds that 34% of organisations are using large language model (LLM) interfaces locally, with Ollama showing the highest adoption, followed by LM Studio and Ramalama at lower levels. Employee use of AI resources accelerates through downloads from AI marketplaces such as Hugging Face, used by users in 67% of organisations, suggesting widespread experimentation and tool-building among staff. AI agents, which automate tasks and access sensitive enterprise data, are also proliferating, with GitHub Copilot now used in 39% of organisations and 5.5% reporting on-premises deployment of agents built from popular frameworks. "More organisations are starting to use genAI platforms to deploy models for inference because of the flexibility and privacy that these frameworks provide. They essentially give you a single interface through which you can use any model you want – even your own custom model – while providing you a secure and scalable environment to run your app without worrying about sharing your sensitive data with a SaaS vendor. We are already seeing rapid adoption of these frameworks and expect that to continue into the future, underscoring the importance of continuously monitoring for shadow AI in your environment," said Canzanese. "More people are starting to explore the possibilities that AI agents provide, choosing to either do so on-prem or using genAI platforms. Regardless of the platform chosen, AI agents are typically granted access to sensitive data and permitted to perform autonomous actions, underscoring the need for organisations to shed light on who is developing agents and where they are being deployed, to ensure that they are properly secured and monitored. Nobody wants shadow AI agents combing through their sensitive data," Canzanese added. Shadow AI agents and risks The prevalence of shadow AI agents is a particular concern as they act autonomously and can interact extensively with enterprise data. API traffic analysis revealed that 66% of organisations have users making calls to and 13% to indicating high-volume programmatic access to third-party AI services. "The newest form of shadow AI is the shadow AI agent -- they are like a person coming into your office every day, handling your data, taking actions on your systems, all while not being background checked or having security monitoring in place. Identifying who is using agentic AI and putting policies in place for their use should be an urgent priority for every organisation," said James Robinson, Chief Information Security Officer. Trends in SaaS genAI Netskope's dataset now includes more than 1,550 genAI SaaS applications, a sharp increase from 317 in February. Organisations now employ about 15 distinct genAI apps on average, up two from earlier in the year. Monthly data uploaded to these applications also rose from 7.7 GB to 8.2 GB quarter on quarter. Security teams' efforts to enable and monitor these tools are credited with a shift towards purpose-built suites such as Gemini and Copilot, which are designed to integrate with business productivity software. However, general-purpose chatbot ChatGPT has seen its first decrease in enterprise adoption since tracking began in 2023. Meanwhile, other genAI applications, including Anthropic Claude, Perplexity AI, Grammarly, Gamma, and Grok, have all recorded gains, with Grok also appearing in the top 10 most-used apps list for the first time. Guidance for security leaders Given the accelerating complexity of enterprise AI use, Netskope advises security leaders to assess which genAI applications are in use, strengthen application controls, conduct inventories of any local infrastructure, and ensure continuous monitoring of AI activity. Collaboration with employees experimenting with agentic AI is also recommended to develop practical policies and mitigate risks effectively.

Ollama's Turbo Update : Features Speed, Power, and Privacy

Geeky Gadgets

29-07-2025

Geeky Gadgets

Ollama's Turbo Update : Features Speed, Power, and Privacy

What if interacting with artificial intelligence could be as seamless as chatting with a friend—no technical hurdles, no steep learning curves? With Ollama's latest Turbo Update, that vision comes closer to reality. This fantastic release doesn't just tweak a few features; it reimagines how users engage with AI, blending speed, power, and accessibility into a single, cohesive platform. From a sleek new interface to new features like 'Turbo Mode,' Ollama is setting a bold new standard for AI accessibility. Whether you're a seasoned developer or an AI enthusiast just starting out, this update promises to make AI interaction faster, smarter, and more intuitive than ever before. Sam Witteveen explores how Ollama's enhancements, such as streamlined file interaction, expanded model support, and robust privacy controls—are reshaping the landscape of AI tools. You'll discover how the platform's thoughtful design lowers barriers for users of all skill levels, while innovations like token-based plans and local storage options ensure flexibility and security. But what truly sets this update apart? It's not just about making AI easier to use—it's about empowering you to unlock its full potential. As we dive into the details, you might find yourself rethinking what's possible in your own AI workflows. Ollama's TurboUpdate A Streamlined and Intuitive App Interface The updated app interface replaces the previous menu bar system with a centralized, user-friendly design. This improvement allows you to manage multiple AI models directly within the app, eliminating the need for complex navigation. Whether you are new to AI or seeking a more efficient workflow, this streamlined interface is tailored to enhance usability and save time. By consolidating features into a single, intuitive hub, Ollama ensures that you can focus on exploring AI capabilities without unnecessary distractions. Enhanced File Interaction and Customization Ollama now supports interaction with various file types, including PDFs and images, allowing you to provide contextual inputs for more precise AI responses. This feature is particularly valuable for tasks requiring detailed analysis or specific references, such as document summarization or image-based queries. Additionally, the platform allows you to adjust model context size and storage preferences, offering greater control over how your data is processed and stored. These enhancements empower you to tailor the platform to your specific needs, whether for personal projects or professional applications. Turbo Mode: Uniting Speed and Power A standout feature of this update is 'Turbo Mode,' which provides access to high-performance cloud-based models like Kimmy K2. Turbo Mode delivers faster processing speeds without requiring local GPU setups or complex API configurations. This feature is particularly useful for handling large-scale models or obtaining quick results for intricate queries. Whether you are a casual user exploring AI capabilities or a professional managing complex workflows, Turbo Mode ensures that you can achieve results efficiently and effectively. Expanded Model Support and Flexibility The platform now supports advanced AI models such as Quen 3 and Kimmy K2, while also allowing you to upload and test your own models. This flexibility caters to a wide range of users, from hobbyists experimenting with pre-trained models to developers working on custom AI solutions. By offering broader model support, Ollama ensures that you have the tools necessary to meet diverse requirements, whether for research, development, or creative exploration. New Ollama AI Features July 2025 Watch this video on YouTube. Here are more guides from our previous articles and guides related to Ollama that you may find helpful. Token-Based Usage Plans for Versatility To accommodate varying usage levels, Ollama employs a token-based system. The free plan provides 10,000 tokens every seven days, making it suitable for light to moderate use. For users with more intensive needs, the Pro plan offers extended usage limits. This flexible structure allows you to select a plan that aligns with your workload and budget, making sure that the platform remains accessible to both casual users and professionals. Commitment to Privacy and Data Control Privacy remains a cornerstone of Ollama's platform. Conversations are not stored in the cloud, making sure that your data stays secure. The app emphasizes local storage options, giving you full control over your data without relying on external servers. These measures reflect Ollama's dedication to user privacy and data protection, making it a trusted choice for those who prioritize security in their AI interactions. Lowering Barriers to AI Interaction By reducing reliance on command-line tools, Ollama's latest update lowers the barrier to entry for AI model testing and usage. The platform is designed to cater to a diverse audience, including developers, researchers, and AI enthusiasts. Its straightforward yet powerful interface ensures that you can explore AI capabilities with ease and efficiency, regardless of your technical expertise. This accessibility makes Ollama an ideal choice for anyone looking to harness the potential of AI. Setting a New Standard for AI Accessibility This update represents a significant milestone in making AI interaction more accessible and user-centric. By combining a redesigned interface, advanced features, and robust privacy controls, Ollama continues to distinguish itself as a leading platform for AI model testing and interaction. Whether you are working with local models or using cloud-based systems, Ollama's latest enhancements deliver a seamless, secure, and efficient experience. These improvements set a new benchmark for how users can engage with AI, empowering individuals and organizations to unlock the full potential of artificial intelligence. Media Credit: Sam Witteveen Filed Under: AI, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

New malware posing as an AI assistant steals user data

Biz Bahrain

14-06-2025

Biz Bahrain

New malware posing as an AI assistant steals user data

Kaspersky Global Research & Analysis Team researchers have discovered a new malicious campaign which is distributing a Trojan through a fake DeepSeek-R1 Large Language Model (LLM) app for PCs. The previously unknown malware is delivered via a phishing site pretending to be the official DeepSeek homepage that is promoted via Google Ads. The goal of the attacks is to install BrowserVenom, a malware that configures web browsers on the victim's device to channel web traffic through the attackers servers, thus allowing to collect user data – credentials and other sensitive information. Multiple infections have been detected in Brazil, Cuba, Mexico, India, Nepal, South Africa and Egypt. DeepSeek-R1 is one of the most popular LLMs right now, and Kaspersky has previously reported attacks with malware mimicking it to attract victims. DeepSeek can also be run offline on PCs using tools like Ollama or LM Studio, and attackers used this in their campaign. Users were directed to a phishing site mimicking the address of the original DeepSeek platform via Google Ads, with the link showing up in the ad when a user searched for 'deepseek r1'. Once the user reached the fake DeepSeek site, a check was performed to identify the victim's operating system. If it was Windows, the user was presented with a button to download the tools for working with the LLM offline. Other operating systems were not targeted at the time of research. After clicking on the button and passing the CAPTCHA test, a malicious installer file was downloaded and the user was presented with options to download and install Ollama or LM Studio. If either option was chosen, along with legitimate Ollama or LM Studio installers, malware got installed in the system bypassing Windows Defender's protection with a special algorithm. This procedure also required administrator privileges for the user profile on Windows; if the user profile on Windows did not have these privileges, the infection would not take place. After the malware was installed, it configured all web browsers in the system to forcefully use a proxy controlled by the attackers, enabling them to spy on sensitive browsing data and monitor the victim's browsing activity. Because of its enforcing nature and malicious intent, Kaspersky researchers have dubbed this malware BrowserVenom. 'While running large language models offline offers privacy benefits and reduces reliance on cloud services, it can also come with substantial risks if proper precautions aren't taken. Cybercriminals are increasingly exploiting the popularity of open-source AI tools by distributing malicious packages and fake installers that can covertly install keyloggers, cryptominers, or infostealers. These fake tools compromise a user's sensitive data and pose a threat, particularly when users have downloaded them from unverified sources,' comments Lisandro Ubiedo, Security Researcher with Kaspersky's Global Research & Analysis Team. To avoid such threats, Kaspersky recommends: • Check the addresses of the websites to verify that they are genuine and avoid scam. • Download offline LLM tools only from official sources (e.g., • Avoid using Windows on a profile with admin privileges. • Use trusted cyber security solutions to prevent malicious files from launching.

Rierino launches AI agent builder to power agents with full system awareness

Zawya

04-04-2025

Business
Zawya

Rierino launches AI agent builder to power agents with full system awareness

Rierino, the next-generation low-code platform for enterprise innovation, announced today the launch of AI Agent Builder —a new capability designed to help organizations build and deploy intelligent agents that operate inside real systems, not just across conversations. Unlike traditional approaches that focus on prompts or pre-scripted flows, Rierino's AI Agent Builder allows teams to give agents secure access to backend logic, real-time workflows, and internal APIs—enabling actions like creating a purchase request, retrieving customer history, or triggering multi-step automation based on enterprise data. 'The missing piece in AI agent development isn't more intelligence. It's more structure,' said Berkin Ozmen, Co-Founder and CTO of Rierino. 'AI agents will transform the enterprise by executing real actions, governed by real logic—where business value is actually created. That requires infrastructure purpose-built for execution, not just conversation.' A Foundation for Enterprise-Grade Agents AI Agent Builder is not a standalone feature, but a natural extension of Rierino's composable, low-code platform. With it, developers can transform any internal logic into agent-accessible capabilities governed by platform-level RBAC, validation rules, audit trails, and contextual schema definitions. Agents can invoke saga flows, Rierino's real-time, event-driven orchestration components, as native tools with clearly defined inputs and outputs. These flows eliminate the need for custom glue code or fragile integrations and make structured actions accessible to large language models (LLMs) by design. The platform supports integration with a wide range of LLM providers, including OpenAI, Google Gemini, Amazon Bedrock, Mistral, Anthropic, and on-prem deployments like Ollama or LocalAI—giving enterprises full flexibility over how and where their AI workloads run. Agents built with Rierino are also channel-agnostic by default. They can be accessed through Rierino's UI, exposed as APIs, or triggered by external events—enabling seamless deployment across chat interfaces, operational systems, or custom frontends. And because all logic is built using Rierino's microservice-based foundation, agent capabilities are modular, versioned, and reusable across teams and systems—ensuring long-term maintainability and scalability as business needs evolve. From Prototypes to Production-Grade Agents Most AI agent platforms today are optimized for experimentation—focused on prototyping flows, generating responses, or showing basic integrations. While that's helpful in the early stages, it falls short in real-world enterprise scenarios where agents must operate across multiple systems, comply with business policies, and deliver measurable outcomes. Rierino's AI Agent Builder is built for the next phase: production-grade deployment. It enables teams to move beyond pilots and proof-of-concepts by equipping agents with structured tools, secure runtime environments, and composable business logic. Agents aren't just asked to generate ideas—they're expected to pull real-time data, initiate multi-step workflows, and act within enterprise guardrails. This shift—from conversation to execution—is what turns AI from a novelty into a force multiplier for productivity, automation, and innovation at scale. Not Just a Tool—An Agent Infrastructure Layer While many platforms position agents as digital assistants or conversational layers, Rierino takes a fundamentally different approach: Agents are infrastructure-level components that should be embedded, orchestrated, and governed like any other part of a modern enterprise system. AI Agent Builder is not a new direction—it's the natural evolution of Rierino's long-standing AI focus. As the first low-code platform to offer embedded AI capabilities dating back to 2020, Rierino has consistently pushed beyond surface-level automation. The 2023 launch of RAI, its embedded GenAI assistant, extended these capabilities into content, translation, and UI generation. AI Agent Builder now extends that same architectural depth to autonomous, action-driven agents. With Rierino, every workflow, API, or rule-based decision can be exposed as a tool an agent can invoke—governed, automatically versioned, and monitored for safe execution. This turns your internal architecture into an AI-ready surface where agents can operate with full trust and transparency. For organizations looking to scale AI safely and meaningfully, this isn't just another feature—it's a platform-level capability ensuring agents to evolve as systems grow, maintain compliance as policies shift, and deliver real business impact without introducing chaos or risk. Rierino AI Agent Builder is now available to enterprise teams looking to bring scalable AI execution into their digital ecosystems. About Rierino Rierino is a next-generation technology company helping organizations accelerate digital transformation through low-code development, composable architecture, and embedded intelligence. Its platform empowers teams to create scalable microservices, orchestrate business logic, and build intelligent applications—without black-box constraints. Rierino is backed by the Future Impact Fund and was named one of Fast Company's Top 100 Startups to Watch.