Is Apple Vision Pro worth it?
Short answer: Not quite yet. But it will be; more than you can imagine.

On January 19, 2024, I woke up early, opened the Apple Store on three devices, and spammed the Apple Vision Pro pre-order at precisely 8:00 AM EST. It shipped two weeks later, and I unboxed it in front of a crowd of 50 or so members from my club, the Society of PC Building, and the Gator VR Club. Since then, I’ve used it extensively every single day and given demos to dozens of people. I have a lot of thoughts on it—both as it exists today, and what it will become further down the road.
v1, 2024
The Vision Pro came in a box larger than any Apple product I’ve ever opened, complete with the headset itself, the Light Seal, two Light Seal Cushions, the Solo Knit and Dual Loop bands, a USB-C charger and brick (rare for Apple), and the infamous $19 Apple polishing cloth, complete with special Vision Pro branding. The headset itself is beautiful. It eschews the Quest’s white plastic for glass and metal, with a curved glass panel covering the EyeSight display on the front, a durable aluminum chassis, high-quality textiles for the Light Seal and cushions, and an adjustable Solo Knit Band made of the finest wool that envelops the back of your head like a hug. It even has orange accents (on the pull tab that detaches the bands), a nod to legendary Apple designer Jony Ive. Despite reviewers’ warnings, I found the comfort of the headset to be fine, and I never had to use the ugly Dual Loop Band. The weight is fine too—definitely heavier than the Quest 3, but not painfully so. The external battery is annoying, but not terrible.
When I tried it on, I was floored by the quality of the tracking. Unlike the Meta Quest 3, you control the Vision Pro with your eyes. To select a UI element, you simply look at it and pinch your fingers together. Hardly anything could be more intuitive. The hand position tracking is excellent, too. The Quest often has difficulties with recognizing when I’m grabbing a window or pointing at a specific element. The Vision Pro, almost never. The passthrough on the Vision Pro is definitely imperfect: it’s dimmer than the real world, and feels like looking at the world through an older iPhone camera, but the latency is ridiculously low. The Quest 3 has major problems with hands, faces, screens, and some other objects distorting the passthrough. The Vision Pro? Almost none. Virtual windows look nearly real—you practically can’t see the pixels, and they stay almost perfectly locked in place. Once, I had some windows in my dorm room, went down 14 floors to swap out my laundry (still wearing the headset), and when I got back to my room, my windows were exactly where I left them. The Quest 3 wins on just one element of immersion—field of view1. The Vision Pro’s FOV is tight enough that there’s a noticeable black band wrapped around what you see.
Once I calibrated it to my eyes and hands, signed in with my Apple ID, and finished setup, it was time to try the software. I spent a while just playing around with it before using it for anything serious. The virtual environments are imperfect but gorgeous, especially the Moon and Haleakalā. Some of the apps, like Sky Guide (planetarium) and JigSpace (3D object visualization) are awesome windows into the future of computing. I tried several iPadOS apps ported to visionOS: it was cool to scroll through X or explore Apple Maps on windows as small as a newspaper or as large as a wall. One of my favorite things to do is FaceTime: though your friends see you as an uncanny 3D scan of your face, you see them as a window in your environment, a tiny hint of what it’ll be like to eventually render them into your environment fully. Spatial Videos are like this too—though they only take up a small part of your FOV and blur at the edges, you can see where they’re eventually headed with the ability to relive your memories.
With the honeymoon phase over, I started to focus on more serious use-cases—mainly, work. This is where I started to run into problems. visionOS resembles iPadOS more than macOS. Its app ecosystem is sparse. Its input mechanisms (voice and virtual keyboard) are extremely slow. Most importantly, it lacks the million tiny things that create the seamless workflow that you get with a real desktop operating system. visionOS is not yet ready for actual productivity. Fortunately, if you have a Mac, you can mirror the screen to the Vision Pro to get a monitor that can be anywhere and any size you want. Unfortunately, you only get one Mac monitor, and it has to be the same aspect ratio as your MacBook: no vertical monitors. Infuriatingly, your keyboard doesn’t show through virtual environments, so if you want to be fully immersed while working, you better know where every single key is. But you have to deal with it anyway, because macOS is so much better than visionOS for work that I ended up forgoing multiple virtual windows altogether and doing all my work on a single Mac window.
There’s one use-case that visionOS excels at more so than any other: watching videos. Folding laundry while watching YouTube on a TV-sized virtual window is cool, but even cooler is watching Dune in 4K resolution, with stereoscopic 3D, high-quality spatial audio, and a virtual screen the size of an IMAX theater. When you’re in Joshua Tree or White Sands, it looks cool but feels fake. When you’re in Cinema Mode on Apple TV, it really, honestly, feels like you’re in a movie theater. This is the Vision Pro’s killer app, the most mature part of Apple’s vision for VR so far, and it’s awesome.
So that’s the state of the Apple Vision Pro today. What can Apple (and Meta) do to bring VR into the future?
Hardware
As a v1 product, the Apple Vision Pro is excellent. But it clearly isn’t mature. It’s not as polished and perfect as the iPhone 15 Pro Max or M2 MacBook Pro—it’s more like the Macintosh in 1984 or the iPhone in 2007. Famously, the iPhone didn’t even launch with an App Store, a selfie camera, a GPS, 3G data, copy and paste, or video recording. We will undoubtedly look back on the Vision Pro with the same surprise.
The hardware of the Vision Pro is a good start, but there’s a long way to go. Even many non-conformist tech people balk at the idea of wearing ski goggles, especially in public. In order to achieve mass adoption, it will have to become much thinner and lighter—resembling glasses more than goggles. An early version of this Apple’s EyeSight display, which is a core differentiator of their VR strategy. In order to be truly immersive, you should be able to see people’s eyes. Marques Brownlee has an excellent video on form factor: he says there are two strategies. One is to start with the ideal form factor and improve the feature set as the hardware gets better. This is what the Meta Ray-Ban Smart Glasses are: simple glasses or sunglasses with nothing but cameras, a mic, speakers, and conversational AI built in. The other is to start with the ideal feature set and improve the form factor as the hardware gets better. This is what big bulky headsets, like the Meta Quest 3 and Apple Vision Pro, are. The goal is to end up with something as thin, light, portable, and fashionable as the glasses (even while powered off) while being as powerful and full-featured as the goggles.
The form factor will have to improve, but so will the capabilities of the hardware. The end goal is total immersion—making the headset as indistinguishable from real life as high-quality headphones are from a stereo speaker system. The single best thing they can do for this is passthrough that looks identical to what you see with your eyes. Until direct optical overlay becomes possible, they’ll have to make do with video. They’ve already got the latency down. Now all that’s left is brightness, color, and resolution. They’ll also have to figure out occlusion, so there’s absolutely no distortion around your hands or other real objects when they’re overlaid onto virtual environments. Then there’s the issue of computing power. Unreal Engine 5 and other computer graphics engines are getting really good, and emerging technologies like OpenAI’s Sora are incredibly flexible, but the Vision Pro has nowhere near the computing power required to run either. To get around this limitation, they could allow users to connect to an external GPU—which Apple should start manufacturing if they want to remain competitive with NVIDIA on hardware.
Software
visionOS is a good start, but it’s unfinished. Just as the iPad Pro can never truly replace the MacBook, visionOS will never truly replace macOS until it changes. At least in the short term, it needs strong native mouse and keyboard support, and the ability to run any program that you can run on a Mac, without screen mirroring. Then there are the little things, built up over decades of refining and perfecting macOS, that visionOS can’t match: robust keyboard shortcuts, the right ratio of text size to screen size, different ways to do window management, more optimized apps, better file management, and so on. The single most important thing: workflow on visionOS should be as seamless, with as few brain cycles required, as macOS. At the very least, we need multiple Mac monitors and keyboard passthrough so we can use macOS as visionOS catches up.
As Cleo Abrams points out, the real reason to care about the Apple Vision Pro is connecting with other people. Right now, there are very few features that allow you to do this—essentially only FaceTime. If you and another Vision Pro user are in the same room, you should be able to view virtual objects together. This can be as simple as watching a movie together in a virtual cinema or as complex as playing Dungeons and Dragons or another board game with virtual pieces. If you’re physically in different places, you should be able to see the world from their perspective. If they’re at a concert, you should be able to essentially “remote” into their headset and see it as they see it. You should also be able to enter a virtual environment with them. The ultimate prototype for this is VRChat, which has amassed millions of players over 10 years. Imagine a version of VRChat where you can have any photorealistic (or otherwise!) avatar, in any environment you want, doing anything you want. This is the future.
The Future Should Look Like The Future
Elon Musk has a saying that the future should look like the future. There are three distant future technologies that Apple now has the opportunity to build: JARVIS from Iron Man, the heads-up display from most sci-fi, and the Holodeck from Star Trek. At the rate AI is advancing, JARVIS might be the easiest of the three. Legendary research scientist Andrej Karpathy, who just retired from “building a kind of JARVIS” at OpenAI, has written about his vision for an LLM-based operating system. Recent advances in speech synthesis (like ElevenLabs), ultra-low latency (like Retell) and small LLMs that can run on-device (like LLaMA-2 7B, Mistral 7B, and Gemma) should make this technologically not too far away. Ideally, you should have an always-there assistant who can answer questions and generate content like ChatGPT, interact with your apps and services like Siri, and be personalized to you and your preferences.
The next technology that Apple should work on is a heads-up display. If we want to make Iron Man a reality, we need not just JARVIS, but the HUD. At a bare minimum, it should be possible to lock windows in a relative position so that they’ll move with you, rather than you moving past them, when you walk. Think FaceTime: you might want to be able to walk “with” a friend by having a window with their face follow your position. The concept of widgets on a phone or complications on a watch might be useful here. You could have persistent indicators in the corner of your field of view for the time, weather, number of steps, notifications, or anything else you might want.
The hardest of the three is the Holodeck; a Star Trek technology that uses holograms to create a realistic 3D simulation of whatever you want using voice commands. There are different routes to get there from today’s technology. One route is OpenAI’s Sora and other video generators, which are versatile and easy to use, but inaccurate. Another route is existing computer graphics/game engines, which are much more accurate, but more difficult (and likely more computationally intensive) to use. Eventually, you should be able to ask to be in a streetside café in Rome, or for a realistic 3D model of a Ferrari, and be able to accurately interact with whatever object or environment you’re given.
The goal is a pair of lightweight, stylish glasses more portable than a phone and more productive than a laptop; capable of overlaying content on the real world like a HUD or generating it like the Holodeck; and running an intelligent, powerful, personalized AI assistant. This is the future of personal computing. And it’s not too far from being the present.
In total, the Quest 3 wins on three things that matter: weight (and no external battery), FOV, and the ability to play Beat Saber.