Hand tracking in VisionOS

(SkeletonDefinition.JointName)

                July 19, 2023

            Hand tracking in VisionOS

            Updated for visionOS 1.0 beta 2
Hand Tracking is a key feature of the Vision Pro, enabling the main user interaction with the device. It allows the device to keep track of a user’s hand position, rotation and size (transform) in the scene as well as each joint within the hand.  With permission, it’s possible for an app to access this data in order to power their own types of features, which is what we’ll dive into in this article.
The data we as App developers have access to is provided under an HandAnchor entity. This is a special type of Anchor that contains a number of important princes of information. Chirality tells us whether this is the left or right hand. It contains a  hand skeleton, revealing details about the joints in the hand. It also consists of the base transform of the hand in world space.
In the WWDC session, “Meet ARKit for spatial computing” we get (the only?) insight into what this skeleton is:
note: in visionOS beta 2, the “hand” prefix has been removed from these references

Each joint is attached to a parent joint, with the .wrist being the root. The `wrist` being the root is the only gotcha with this hierarchy. All of the fingers, the thumb, but also the forearm flow back here. This means the parent of `forearmArm` is `forearmWrist`, whose parent is `wrist`, which is a bit counterintuitive
 For those anatomists among us, you may be confused with the splitting of the metacarpal and the knuckle, as each refers to the same thing. Here it seems Apple is referring to the metacarpal bone and the knuckle joint. It’s a bit different to the references used for the Vision (2D) version of Hand Tracking which uses the actual joint names to reference the positions. These can be mapped as below:
Text within this block will maintain its original spacing when publishedVisionOS                                                   Vision
(SkeletonDefinition.JointName)             (VNHumanHandPoseObservation.JointName)
.indexFingerTip                             .indexTip
.indexFingerIntermediateTip     .indexDIP
.indexFingerIntermediateBase       .indexPIP
.indexFingerKnuckle                     .indexMCP
.indexFingerMetacarpal              No equivalent
.thumbTip                                       .thumbTip
.thumbIntermediateTip               .thumbIP
.thumbIntermediateBase              .thumbMP
.thumbKnuckle                               .thumbCMC
.wrist                                              .wrist
.forearmWrist                               No equivalent
.forearmArm                                     No equivalent
Each of the fingers follow the same pattern as above, with ‘index’ replaced by, ‘middle’, ‘ring’ and ‘little’ as appropriate. It’s certainly a bit confusing, to go from anatomically correct names, to made up landmarks, with some mixed up reference to anatomy.
Tracking a Hand
How do we go about actually tracking a hand? Well first caveat, is without a device, you can’t! Hand tracking isn’t available in the simulator and there’s no other device that exposes the HandAnchor for us to play with, so this is all theoretical for now. Regardless, if you want to prepare your app for when it is possible, there are a few things to be aware of. First off, you need an ARKitSession, then you need the user’s permission to access that session. This comes in the form of a .handTracking request via ARKit. Note, unlike permission prompts on other platforms, this is an option request, as ARKit will automatically prompt the user if you leave this out, rather than crash outright. However, without type of check, you won’t know if someone has denied your request, and be able to handle it appropriately.
let session = ARKitSession()
func requestAuth() async {
        let request = await session.requestAuthorization(for: [.handTracking])
        for (authorizationType, authorizationStatus) in request {
            //handle auth here
        }
    }
Once we have permission, we can add hand tracking to the session, using a HandTrackingProvider, then retrieve any updates asynchronously to the hand anchors as they are made available.
class HandTracking {
    let session = ARKitSession()
    let handTracking = HandTrackingProvider()

    func runSession() async {
        do {
            try await session.run([handTracking])
        } catch {
            print("error starting session: \(error)")
        }
    }
    func processHandUpdates() async {
        for await update in handTracking.anchorUpdates {
            let handAnchor = update.anchor

            guard handAnchor.isTracked else { continue }

            //do something with hand/joints
        }
    }
}
Limitations
We’ve already come across one limitation of hand tracking, in that it’s not available in the simulator at present. The second limitation is your app has to be running in an independent full space, not in the shared space with other apps. This means your app needs to be the only one in focus in order to access any hand tracking data (as well as other types of ARKit scene data). If you had ideas about a fully gesture based Todo app, or say an always open music player that you could control with gestures, whilst working in other apps, then that won’t be possible. It makes sense, as you could easily have multiple apps with conflicting gestures, as well it being a real drain on system resources.
What is Possible?
So what can you do with hand data? Well custom gestures is one, as shown in the Happy Beam sample code provided by Apple. It’s a bit convoluted right now, to extract each joint position, calculate distances between them and use that to determine when a gesture has been made, but it is possible. An extension of this is direct interaction with objects in the spatial world. Picking up objects or applying forces are possible by attaching physics objects directly to your hands.  
Finally attaching 3D models or 2D overlays to the hand opens up other possibilities, particularly for games.  Who wouldn’t want a Tony Stark style Power Gauntlet to control? 
It will be interesting to see what Apple has planned for developers to access these features, without have a Vision Pro to test with, as it seems like it will be difficult otherwise.  In the meantime, if you want to simulate what it could be like, then see my other article about faking hand tracking in the VisionOS Simulator…
Thanks for reading StuVision: Exploration of Apple's Vision Pro! Subscribe for free to receive new posts and support my work.

Don't miss what's next. Subscribe to StuVision an Exploration of Apple Vision Pro: