visionOS 2.0 at WWDC 2024 Predictions
We’re a few months away from WWDC 2024 (confirmed for 10-14 June) so that means that it is prediction time. Or as I like to call it, the time to cross fingers and hope that the talented folk at Apple have been toiling over the same issues as you have been having for the last 6 months.
Here’s my list of things I’d like to see in the next version of visionOS. It’s hoped that 2.0 will be announced, or at least previewed at WWDC this summer. Most of these are direct experiences from developing in visionOS so if you have other things you’d like to see, I’d love to hear about them.
Window Management
This is the top user request if you’ve read any reviews or listened to any Podcasts on the topic. It’s a given that Apple will do something to address the window issues from a user’s perspective in the next version of the platform, and hopefully open up the Mac experience with some new features (multiple screens, dragging windows out of the mac “window”) but I’d also like to see window management addressed from the developers perspective.
Shared Space
There’s too much of a disconnect between what’s possible in the Shared space and an Immersive space for an App. World Anchoring and Plane detection are great features, but should be available whilst an app is being used in the shared space, not restricted to the single open foreground app as is the case now. This will open up the ability to have a permanent clock on the wall, or AR driven experiences next to Safari and your note taking tools rather than instead of. The multi app paradigm is a major use case of the Vision Pro, but this restriction is holding it back.
Window’s everywhere
I bet there’s a lot of custom code out there for any app that manages multiple windows, trying to determine what is currently open, and what should be opened next. Allowing the developer to influence what happens if their main window is closed and the user tries to re-open the app (hint: it shouldn’t just open the tiny settings panel they happened to open and lose behind the couch) without having to resort to ugly hacks would be very welcome.
Volumetric windows either need to go, or be improved with new features. Currently I can’t see any benefit they have over a regular windows, apart from (I think) the ability to add more depth. Until they can either be allowed to interact more with the real world, or have new possibilities like say having the ability to hide the window bar, then they’re a bit redundant right now. Perhaps the volumetric window paradigm can be repurposed as the widget system that is sorely lacking in visionOS 1.x.
Lastly how windows from different apps work together is a little problematic at the moment. The ability to place or dock windows side by side is hampered by one of the windows being made transparent in order to overlap each other. I’d like this feature to be reduced, or perhaps allow developers to disable this on a per window basis. This would allow apps to be used more closely together and enhance each other rather than feeling like it’s one or the other. I know there’s space “all around you” to place windows, but in reality most users are putting a few windows in front of them. Why not take it one step further and allow windows to snap to each other when they get close and act as a window group, locked together from then on.
Space/Object Recognition
The Vision Pro is very restrictive about what it allows developers to know about the environment the user is in. There’s no access to the camera feed directly, nor the ability to run object recognition. There’s obvious security and sensitivity issues about knowing what books are on someone’s shelf, but at the same time the ability to build upon the data in the feed coming from the cameras for object recognition and interactivity are required to unlock some new classes of experiences. I hope Apple finds a way to do this in a safe way and we see some of the restrictions lifted this year.
I’d also like to see improvements to the auto anchoring system, where you can ask the system to put an anchor on a table for you, which allows you to place objects in the environment without revealing where that table is. There’s no control from the developer or the user about which table in the room is chosen, nor does it give you enough information about that anchor to place the object correctly on that surface. The system will also happily locate say a wall for you, but then act as if that is the only wall in existence, so not add any other walls to the environment so you just have to hope the user is looking at the right place when they open your app. I’m not sure what I can learn from a user who has 4 walls in their environment, rather than 1, but perhaps it’s not what I can do with that information, but what nefarious things a bad actor could do (fingerprinting?).
RealityKit/ARKit Improvements
RealityKit is clearly the direction that Apple is taking its 3D rendering, based on a similar phyoslopy to SwiftUI, a declarative way to work in 3D. However, like SwiftUI when it was first launched it’s a little rough around the edges and missing some things, or has made some things harder than they were in the SceneKit world. For example there’s no access to body tracking apart from the hands of the user, placing anchors at a longitude/latitude isn’t supported and materials are powerful but complex, particularly the material graph in Reality Composer Pro. General improvements and feature parity between RealityKit/SceneKit/ARKit across operating systems would help more developers bring their existing apps to visionOS with minimal re-writes. The physics system is also a bit of a black box, excludes particles and is not as reliable or as realistic as its SceneKit counterpart.
One area that I believe would help is some kind of custom gesture system. It’s possible to know where each joint of the hand is, and react to particular joint positions that you very specifically program for, but if you look at the sample code for Happy Beam you’ll see how much of an overhead this is to implement. Now perhaps custom gestures should generally be avoided, but there are some situations where they make sense (mainly games), and having an api that makes this process a lot easier to implement, perhaps declaratively, would be welcome.
Whilst on the topic of hands, the current lag on the hand tracking system makes a lot of use cases difficult or infeasible and I’m sure has been the reason for plenty of apps and games to not even try to launch in its current state.
Simulator Testing
The simulator is a great starting point for testing Vision Pro content, in fact the Previews within Xcode works surprisingly well for such a complex system and are really handy for iterative development. It is so hard to replicate the actual experience of using the Vision Pro within a simulator, but there are still areas that the simulator could improve to help with the experience.
Firstly the simulator doesn’t support any of the detection services such as plane detection, image or hand tracking. Frustratingly it’ll show you planes and the world mesh in the debug settings, but they’re seemingly invisible to the frameworks. Having these available would ease one of the biggest blockers in developing for visionOS, particularly as there’s still only one country in the world who has access to devices right now.
Device Availability
This leads us onto the biggest thing Apple could do, which is to open up the sale of the device around the world or failing that allowing developers to purchase devices without the restrictions associated with it being tied to the US. If nothing else happens but this at Apple’s World Wide Developer Conference that will make a big difference to developers around the wide world.
Now the way this usually works is you get 10% of the things that you hoped for, but then another 50% of things you never knew you needed but are glad you have. Let’s see what happens in a few weeks time, but I know what I’d like to see most 🥽🌏…