Machine learning platform , Hugging Face , has released an iOS app that will make gumption of the existence around you as seen by your iPhone ’s tv camera . Just sharpen it at a prospect , or click a word picture , and it will deploy an AI to describe it , identify objects , perform translation , or pull text - based details .

cite HuggingSnap , the app takes a multi - model approach path to understanding the panorama around you as an input , and it ’s now available for complimentary on the App Store . It is powered by SmolVLM2 , an heart-to-heart AI simulation that can manage textbook , mental image , and telecasting as remark data format .

The overarching goal of the app is to let people get wind about the objects and scene around them , including plant and animal recognition . The idea is not too unlike fromVisual Intelligence on iPhones , but HuggingSnap has a important leg - up over its Apple challenger .

It doesn’t require internet to work

All it needs is aniPhone running iOS 18and you ’re good to go . The UI of HuggingSnap is not too dissimilar from what you get with Visual Intelligence . But there ’s a cardinal divergence here .

orchard apple tree trust on ChatGPT for Visual Intelligence to knead . That ’s because Siri is currently not adequate to of acting like a procreative AI tool , such as ChatGPT or Google ’s Gemini , both of which have their own knowledge bank . rather , it offloads all such user postulation and query to ChatGPT .

That require an internet connection since ChatGPT ca n’t work in offline mode . HuggingSnap , on the other hand , work just fine . Moreover , an offline glide path means no user data ever leaves your phone , which is always a welcome alteration from a privacy linear perspective .

What can you do with HuggingSnap?

HuggingSnap is power by theSmolVLM2 modeldeveloped by Hugging Face . So , what can this model running the show behind this app accomplish ? Well , a lot . Aside from resolve interrogative sentence establish on what it picture through an iPhone ’s camera , it can also work simulacrum picked from your telephone set ’s gallery .

For example , show it a motion picture of any historic memorial , and ask it to give you travel suggestions . It can empathise the stuff appearing on a graph , or make sense of an electrical energy bill ’s motion-picture show and answer queries base on the details it has picked up from the text file .

It has a lightweight architecture and is particularly well - suited for on - twist applications of AI . On benchmarks , it performs better than Google ’s contend open PaliGemma ( 3B ) model and fret shoulders with Alibaba ’s rival Qwen AI model with vision capacity .

The biggest advantage is that it requires less scheme resourcefulness to run , which is particularly important in the context of smartphones . Interestingly , the democratic VLC media participant is also using the same SmolVLM2 theoretical account to render video description , let users seek through a video using born language prompts .

It can also intelligently extract the most important high spot moment from a video . “Designed for efficiency , SmolVLM can do questions about images , describe visual capacity , make stories grounded on multiple image , or function as a pure speech model without visual inputs,”saysthe app ’s GitHub repository .