Many people like me have struggled with getting audio to work in iOS. Allow me to drop some knowledge. I’m using SoundFX but this applies to any audio plugin.
First, make sure your audio files have at least 0.5 seconds of silence at the beginning of the sound – I find that’s how long it takes iPhone to turn on the speaker.
Next, unfortunately, Apple requires users to interact with the phone before any sound can be played in that instant (within 600ms of the interaction is what I’ve observed in Bubble). This is why having any sound play, with the sound beginning to play more than 600ms after the last time a user taps a button, WILL RESULT IN NO SOUND because iOS suppresses it. This restriction is true even if the user is using an iOS PWA web app (whereas on Android, this restriction is relaxed when the user installs your PWA).
Fortunately there is a hack to overcome this “auto-play” limitation in iOS (unrelated to SoundFX).
- Preload ALL your sounds on page load
- Here’s the hack – As soon as the user taps any button, use that opportunity to “initialize” every sound, using a workflow with Play and then immediately Pause for EVERY sound (along with the other actions in your app that button triggers)
To repeat, after page load, the user MUST tap a button or otherwise interact somewhere, and you use that opportunity to initialize the sounds. In other words, it is not possible to auto-play a sound without your user tapping something first.
- Now all your sounds are preloaded and initialized, ready to auto-play anytime without further user interaction, using the Play action
Yes this is a pain to set up and a drag on resources, especially if you have a lot sounds. But this is the only way to do auto-play on iOS, regardless of using this plugin.
Unfortunately, if your app has too many sounds to preload and initialize them all, or your app generates audio files on the fly (e.g. text to speech), then I believe it is impossible for your app to auto-play audio on iOS (in other words, impossible without having an immediate user interaction). If you’re doing text to speech on the fly, there is probably not enough time for the server to reliably get the data out and back within the 600ms threshold.
For more info on this hack read: