External API
The following content on this page is exclusive to the Moonshot version.

External API #

You can directly control MMDAgent-EX from an external program via socket communication. In addition to standard message send/receive control, the MS version has extra external API for using CG agents as avatars. Below is an explanation of how to use it.

Connecting to MMDAgent-EX #

When using the external API from an external program, connect to MMDAgent-EX via a socket from the external program. For methods of socket connection, please see the page “Control via Socket Connection”.

Use WebSocket connections. Some features are not supported with TCP/IP connections.

When connection was established, the following message will be issued in MMDAgent-EX. This message is also sent to the connected peer, so the peer can be notified that connection has been established by capturing this message.

REMOTE_EVENT_CONNECTED|peer_host_info

The following message will be sent when a connection was lost. The disconnected peer can not receive this since this will be issued after connection lost. However any program with MMDAgent-EX can capture that a client has been lost with this message.

REMOTE_EVENT_DISCONNECTED|peer_host_info

Specifications and Messages #

Specification and the list of messages to be sent from outer program to MMDAgent-EX are as follows:

  • Always append \n at the end of the message.
  • For sending multiple messages in one transmission, append “\n” at the end of each message, like Message1\nMessage2\n.
  • Use binary transmission mode of WebSocket.
  • Settings will be reset when the socket is disconnected.
  • Most of these messages are for avatar usage only, and will not be sent to the internal message queue. Instead, any message not in the following list will be accepted and sent to internal message queue as normal socket connection behavior.
__AV_SETMODEL,model_alias_name
__AV_AUTOCALIBRATE,{true|false}
__AV_AUTORETRACT,{true|false}
__AV_START
__AV_END
__AV_RECALIBRATE
__AV_ACTION,idx
__AVCONF_ALLOWFARCLOSEMOVE,{true|false}
__AV_TRACK,x,y,z,rx,ry,rz,eyeLrx,eyeLry,eyeLrz,eyeRrx,eyeRry,eyeRrz,flag
__AV_ARKIT,shape=rate,shape=rate,…
__AV_AU,num=rate,num=rate,…
__AV_EXBONE,name,x,y,z,rx,ry,rz,name,x,y,z,rx,ry,rz,…
__AV_EXBONEQ,name,x,y,z,rx,ry,rz,rw,name,x,y,z,rx,ry,rz,rw,…
__AV_EXMORPH,name=rate,name=rate,…
__AVCONF_DISABLEAUTOLIP,{NO|ARKIT|AU|ARKIT+AU|ALWAYS}
SNDSTRM
SNDFILE
SNDBRKS
SNDOPUS
SNDNPCM
SNDxxxx(body)
AVATAR_LOGSAVE_START|logfile.txt
AVATAR_LOGSAVE_STOP
AVATAR_EVENT_IDLE|START
AVATAR_EVENT_IDLE|STOP

Basics for Tracking #

In this API, especially the “tracking messages” (__AV_TRACK, __AV_ARKIT, __AV_AU, __AV_EXBONE, __AV_EXMORPH) will directly controls the CG agent. Here is some basics you should know when using these tracking messages.

  • Tracking will be applied only to the model specified by the __AV_SETMODEL message. You should specify which model to control by __AV_SETMODEL before start tracking.
  • Tracking will take effect after sending __AV_START, and ends at sending __AV_END. Tracking messages after __AV_END will be ignored.
  • Tracking control precedes autonomous actions: in case the tracking movement collides with autonomous motions on a bone or a morph, the movement by autonomous action will be disregarded and tracking movement will be taken.
  • Tracking message gives target parameter to MMDAgent-EX, and MMDAgnet-EX will make the model move towards the specified parameter. You can send messages successively at most 60fps for real-time control.
  • “Auto Calibration” for head tracking is enabled by default. Enabling this function makes MMDAgent-EX to use the first arriving 10 messages of __AV_TRACK to calibrates the orientation of the user’s face. You can disable or enable this feature by __AV_AUTOCALIBRATE message.
  • “Auto Retraction” while no signal is enabled by default. When enabled, if a tracking message is not sent for a while (default is 1 second), the motion control will stop temporarily, and will be resume at a new tracking message. When disabled the last tracking state will be always maintained. You can disable or enable this ffeature by __AV_AUTORETRACT message.
  • MMDAgent-EX will issue “avatar control idle” message. It will issue AVATAR_EVENT_IDLE|START if any message has not been sent for a while. When a message arrives after that, it will issue AVATAR_EVENT_IDLE|STOP.
The waiting time before AVATAR_EVENT_IDLE|START is 15 seconds by default but can be changed by specifying Plugin_Remote_Idle_Timeout_Second= in .mdf.

Details of each message #

Start, End, Settings, and Logging #

AV_SETMODEL,alias_name #

Specifies the model to be operated on by the external API. All commands of external API are applied to the model specified in this message. Be sure to do this before starting tracking.

__AV_SETMODEL,0
  • Specify the model by its alias name.
  • If the alias name is not found, this message is ignored.
  • Even during tracking control, it is possible to change the target model by this message at any time.
  • Only one model can be operated on at the same time.

__AV_AUTOCALIBRATE #

Disable/Enable automatic head orientation calibration.

The “automatic head orientation calibration” is a feature to calibrate the orientation of the operator’s face and the webcam in face tracking.

When enabled, the first 10 instances of __AV_TRACK messages are saved without being applied, and their average is used as the correct position of the face in the subsequent __AV_TRACK messages.

When disabled, the values sent in __AV_TRACK will be applied as are, without no modification.

Enabled by default, and can be disabled / enabled by sending the following message:

To disable auto calibration,

__AV_AUTOCALIBRATE,false

To enable auto calibration,

__AV_AUTOCALIBRATE,true

__AV_AUTORETRACT #

Disable/Enable automatic motion retraction.

The auto-retract function is a feature to avoid freezing of CG avatar in case of blank opertaion.

When enabled, when a situation occurs in which no tracking messages has been sent for a while (default is 1 second), MMDAgent-EX temporarily leaves tracking control. Subsequent new tracking messages will automatically restart to the tracking state.

When disabled, this automatic control leaving does not take effect: the tracking result is always kept and shown during tracking control.

Recommended: enable this at real-time tracking, and disable this at discrete control. If you continuously send tracking messages during operation, turning this function ON can prevent the avatar from freezing in case of communication errors or tracking mistakes. Conversely, if you control the posture discretely, having it ON will cut off control after 1 second of giving the posture, so turning it OFF in such cases will maintain the last posture.

To disable the auto-retract feature,

__AV_AUTORETRACT,false

To enable the auto-retract feature,

__AV_AUTORETRACT,true

To set auto-retract wait time and enable it,

__AV_AUTORETRACT,seconds

Note: setting 0 is not equivalent to false. Use false when you totally disable this feature.

__AV_START #

Start tracking control. After this message, the following tracking messages seize the part of the CG avatar’s body and starts avatar operation through tracking messages (__AV_TRACK, __AV_ARKIT, __AV_AU, __AV_EXBONE, __AV_EXMORPH).

__AV_START
  • MMDAgent-EX will issue message AVATAR|START and also set a KeyValue Avatar_mode=1.0 when this message is received.
  • Tracking messages (__AV_TRACK, __AV_ARKIT, __AV_AU, __AV_EXBONE, __AV_EXMORPH) will take effect after this.
  • Model to be controlled should be specified prior to this by __AV_SETMODEL message.

__AV_END #

Ends tracking control. Control of body by external API is turned OFF. A tracking message (__AV_TRACK, __AV_ARKIT, __AV_AU, __AV_EXBONE, __AV_EXMORPH) that is sent during OFF will be ignored.

__AV_END
  • MMDAgent-EX will issue message AVATAR|END and set a KeyValue Avatar_mode=0.0 when this message is received.

AVATAR_LOGSAVE_START, AVATAR_LOGSAVE_STOP #

Tell MMDAgent-EX to save all the sent messages into text file.

AVATAR_LOGSAVE_START|logfile.txt
AVATAR_LOGSAVE_STOP

__AV_RECALIBRATE #

Tell MMDAgent-EX to perform calibration right now. When auto-calibration is enabled, MMDAgent-EX will perform calibration using the first __AV_TRACK messages. This message activates the calibration from now.

__AV_RECALIBRATE

Dialogue Actions #

__AV_ACTION,number #

Play the pre-defined dialogue action. Give a number to play. This works independently with tracking, even if tracking is not started.

__AV_ACTION,3
  • The motion corresponding to the given number will be played on the target model as part motion
  • The started motion will play for once, not looped.
  • Newer __AV_ACTION will override the older one which is still playing.
  • Tracking control overrides dialogue action. The override will take effect per bone or face: only the bones and faces controlled by the tracking will be superceded.

The motion file that corresponds to the number is defined in .shapemap file at MMDAgent-EX side. __AV_ACTION with undefined number will be ignored. At CG-CA models, the following 39 actions are defined.

ActionNumber Description
 0  ノーマル/ normal
 1  喜び/ joy, happy
 2  笑い/ laugh, amusing
 3  笑顔/ smile
 4  微笑/ little smile, agreement
 5  固い笑顔/ graceful smile
 6  照れ笑い/ embarassed smile
 7  困り笑い/ annoyed smile
 8  驚き・ショック/ surprise
 9  意外/ unexpected
10  感動/ impressed, excited
11  感心/ admired
12  期待・興味/ expectant, interested
13  納得・理解/ convinced
14  きりっと/ crisp, prepared
15  キメ顔/ proud, confident
16  考え中/ thinking
17  とんでもない/ no thank you (with smile)
18  同情・気遣い/ compassion, caring
19  ドヤ顔/ triumphant
20  困った/ in trouble, annoyed
21  軽い拒否・叱り/ no, accuse, disgust
22  謝り/ apology
23  緊張/ stressed
24  恥ずかしい/ embarassing
25  ジト目/ sharp eyes, suspicion
26  くやしい/ mortifying
27  煽り/ provoking
28  眠い/ sleepy
29  焦り・怖ろしい/ impatience, fear, terrible
30  呆然・唖然/ stunned, devastated
31  落胆/ disappointed
32  イライラ/ irritated, frustrated
33  怒り/ anger, furious
34  哀しみ/ sad
35  怖い/ afraid
36  不安・心のざわつき/ anxious
37  感傷的/ sentimental
38  恥じ入る/ ashamed

You can define new actions in .shapemap file. The action number can be from 0 to 99. For example, when you want to add greeting motion (ojigi.vmd) as dialogue action of number 40, you should add the line below to .shapemap file. Note that the .vmd file path is relative to where .shapemap file exists, not the current directory.

ACT40 somewhere/ojigi.vmd

After that, you can play the greeting action by sending __AV_ACTION with number 40.

__AV_ACTION,40

Head Tracking #

__AV_TRACK,x,y,z,rx,ry,rz,eyeLrx,eyeLry,eyeLrz,eyeRrx,eyeRry,eyeRrz,flag #

Send head and eye target parameters for head tracking. MMDAgent-EX will then control the related bones of the target model toward the given target posture. Sending this message continuously will perform realtime head tracking. Note that the given movements and rotations are not a direct movement command setting target: for example, give message to rotate by 30 degrees and then giving the same message does not result in 60 degrees rotation, they just say that model should be rotated for 30 degrees from the default pose.

  • x,y,z: head movement (mm)
  • rx,ry,rz: head rotation on X axis, Y axis and Z axis (radian)
  • eyeLrx,eyeLry,eyeLrz: left eye rotation (radian)
  • eyeRrx,eyeRry,eyeRrz: right eye rotation (radian)
  • flag: eye rotation is global(1) or local(0)

The unit of the movement is millimeter.

Head rotation should be given as amount of local rotations around X axis, Y axis and Z axis in radian. Note that all the rotation should be given as left-handed coordination system.

Eye rotation also should be given as X/Y/Z axis rotation in radian.

The flag is eye rotation flag. It should be set to 0 when the eye rotation is given in local rotation (i.e. relative to head). It can be 1 if eye rotations are given in global (i.e. absolute rotation in world coordinates). If you are going to send tracking parameters given by OpenFace, you should set this flag to 1. For Apple ARKit parameters, this should be set to 0.

This flag will also switches how the given eye rotations are treated. When this flag is set to 1, the rotations of left eye and right eye are averaged to one and then the same averated rotation wil lbe applied to both eyes using the “both-eye” bone. This forced normalization is to ensure that both eye has exactly the same direction in OpenFace.

When __AV_AUTOCALIBRATE is enabled (this is default), MMDAgent-EX assumes the first head position and rotation as the normal position, and calibrate the following sent parameters by the average of the first 10 messages. This auto calibration can be disabled by __AV_AUTOCALIBRATE,false.

MMDAgent-EX does not just apply the head parameters to head of CG agent, instead it performs movement enhancement for CG-tailored tracking movement. In addition to the head and eye of CG agent, it also controls body according to the head parameters. The actual name of the bones that head tracking should be managed is defined in .shapemap file of the target model: TRACK_HEAD, TRACK_NECK, TRACK_LEFTEYE, TRACK_RIGHTEYE, TRACK_BOTHEYE, TRACK_BODY, TRACK_CENTER. The distributed CG-CA model already has the following definitions, so you have to do nothing for CG-CA models. For models other than CG-CA, you can start by just copying the .shapemap file to the new model, since the model structure is almost the same.

TRACK_HEAD 頭
TRACK_NECK 首
TRACK_LEFTEYE 左目
TRACK_RIGHTEYE 右目
TRACK_BOTHEYE 両目
TRACK_BODY 上半身2,上半身2,上半身1,上半身
TRACK_CENTER センター

When you define multiple bone names as in the TRACK_BODY above, MMDAgent-EX will search for the bone of the name in its order, and the first found one will be adopted.

When you feel that the eye rotations of the CG model is too large or too small, check the eye rotation coefficient as defined in the .shapemap file. Giving larger value will make CG model to rotate more.

# Coef. of eye rotations
EYE_ROTATION_COEF 0.4

The sent head and eye rotations are applied to the bodies of the CG model with re-scaling. The rescaling factors can be modified by defining the following items in .mdf file (the values are default values). Larger value makes more movement.

# Coef. of BODY rotation from head rotation
Plugin_Remote_RotationRateBody=0.5
# Coef. of NECK rotation from head rotation
Plugin_Remote_RotationRateNeck=0.5
# Coef. of HEAD rotation from head rotation
Plugin_Remote_RotationRateHead=0.6
# Coef. of CENTER up/down movement from head rotation
Plugin_Remote_MoveRateUpDown=3.0
# Coef. of CENTER left/right movement from head rotation
Plugin_Remote_MoveRateSlide=0.7

If you want to get the behaviors mirrored, set the following in your .mdf file.

# enable mirrored movement
Plugin_Remote_EnableMirrorMode=true

AVCONF_ALLOWFARCLOSEMOVE,value #

Switch whether to apply forward/backward movement of the head parameters to modfel. true will apply, and false will not. The default is true.

__AVCONF_ALLOWFARCLOSEMOVE,true

Facial tracking by Shapes #

__AV_ARKIT,name=rate,name=rate,… #

Send a set of shape target rates. Send this continously to perform real-time face tracking. The name=rate,... part is a set of shape names and its rates [0..1]. Any number of shape names can be sent at a message.

MMDAgent-EX will assign the received shape rates to model’s morphs. The mapping from the given shape names in the __AV_ARKIT message and the actual morph names in the target CG agent model SHOULD BE DEFINED IN THE .shapemap FILE at the model side. The shape names undefined in the shapemap file will be ignored.

For simple example, assume you are going to control eye blink. You are going to use a string blink as the shape name, whereas the target CG model has a blink morph as まばたき. In this case, first define its mapping in the .shapemap file of the target model like this:

blink まばたき

Then, you can send parameters like this to control it. Value of 1.0 will make model blink, or 0.0 to unblink.

__AV_ARKIT,blink=1.0

The example above is a simplest case. For actual facial tracking, you need every facial morphs to be moved according to the facial capture results. CG-CAs are all equipped with 52 special morphs corresponding to blendShapes in Apple ARKit facial tracking parameters, and its mapping has been already defined in their . shapemap files.

ARKit compliant mappings defined in CG-CA shapemap
browDown_L browDownLeft 
browDown_R browDownRight 
browInnerUp browInnerUp 
browOuterUp_L browOuterUpLeft 
browOuterUp_R browOuterUpRight 

cheekPuff cheekPuff 
cheekSquint_L cheekSquintLeft 
cheekSquint_R cheekSquintRight 

eyeBlink_R eyeBlinkRight 
eyeBlink_L eyeBlinkLeft 
eyeLookDown_L eyeLookDownLeft 
eyeLookDown_R eyeLookDownRight 
eyeLookIn_L eyeLookInLeft 
eyeLookIn_R eyeLookInRight 
eyeLookOut_L eyeLookOutLeft 
eyeLookOut_R eyeLookOutRight 
eyeLookUp_L eyeLookUpLeft 
eyeLookUp_R eyeLookUpRight 
eyeSquint_L eyeSquintLeft 
eyeSquint_R eyeSquintRight 
eyeWide_L eyeWideLeft 
eyeWide_R eyeWideRight 

jawForward jawForward 
jawLeft jawLeft 
jawOpen jawOpen 
jawRight jawRight 

mouthClose mouthClose 
mouthDimple_L mouthDimpleLeft 
mouthDimple_R mouthDimpleRight 
mouthFrown_L mouthFrownLeft 
mouthFrown_R mouthFrownRight 
mouthFunnel mouthFunnel 
mouthLeft mouthLeft 
mouthLowerDown_L mouthLowerDownLeft 
mouthLowerDown_R mouthLowerDownRight 
mouthPress_L mouthPressLeft 
mouthPress_R mouthPressRight 
mouthPucker mouthPucker 
mouthRight mouthRight 
mouthRollLower mouthRollLower 
mouthRollUpper mouthRollUpper 
mouthShrugLower mouthShrugLower 
mouthShrugUpper mouthShrugUpper 
mouthSmile_L mouthSmileLeft 
mouthSmile_R mouthSmileRight 
mouthStretch_L mouthStretchLeft 
mouthStretch_R mouthStretchRight 
mouthUpperUp_L mouthUpperUpLeft 
mouthUpperUp_R mouthUpperUpRight 

noseSneer_L noseSneerLeft 
noseSneer_R noseSneerRight 

tongueOut tongueOut 

You should not mix using this message with AU based tracking (__AV_AU).

Facial Tracking by Action Unit (AU) #

__AV_AU,num=rate,num=rate,… #

This message performs facial tracking based on AU ( Action Unit). MMDAgent-EX controls facial morphs according to the sent AU parameters. Sending this message continuously to perform real-time facial tracking based on AU.

Repeat num=rate part to send several AU parameters. The num should be an index of Action Unit (from 1 to 46), and the rate should be from 0.0 to 1.0. A rate value larger than 1.0 will be truncated to 1.0.

Action Unit does not directly corresponds to facial morphs, so you should define how morphs should be controlled for each AU parameters in .shapemap file at CG model side. The following is an example of assigning AU number 6 (cheek raiser) to 笑い (laughing eye) morph, number 1 (inner brow raiser) to (brow raise), and number 4 (brow lowerer) to 困る (annoying brow). See the Shapemap page for details.

AU6 笑い >0.7
AU1 上 0.5
AU4 困る

You should not mix using this message with another tracking message __AV_ARKIT.

Individual Bone Control #

__AV_EXBONE, __AV_EXBONEQ #

Controls a bone via API. Any bone on the target model can be controlled.

  • Auto calibration is not performed on this message.
  • Auto retraction will be applied to this message.

Use __AV_EXBONE to give a rotation parameter by rotations around X-axis, Y-axis and Z-axis, or else use __AV_EXBONEQ to express rotation in quaternion.

__AV_EXBONE,name,x,y,z,rx,ry,rz,rw,name,x,y,z,rx,ry,rz,rw,…**
  • name: boneControlName
  • x,y,z: moves (mm)
  • rx,ry,rz: rotations around X-axis, Y-axis and Z-axis (radian)
__AV_EXBONEQ,name,x,y,z,rx,ry,rz,rw,name,x,y,z,rx,ry,rz,rw,…**
  • name: boneControlName
  • x,y,z: moves (mm)
  • rx,ry,rz,rw: rotation quaternion (radian)

The mapping of boneControlName used in the message above and the actual bone name in the target CG model should be defined in the model-side .shapemap file.

Note: CG model has special bones like “IK bones” and “Physics-simulated bones”. Those are controlled by computation inside MMDAgent-EX. You can still give parameters to such bones but the result may not be what you expected.

Individual Morph Control #

__AV_EXMORPH,name=rate,name=rate,… #

Control arbitrary morph via API. Any morph on the target model can be controlled.

  • Auto calibration is not performed on this message.
  • Auto retraction will be applied to this message.

Use __AV_EXMORPH to give morph parameters.

  • name: controlMorphName
  • rate: morph value (from 0.0 to 1.0)

The mapping of controlMorphName used in the message above and the actual morph name in the target CG model should be defined in the model-side .shapemap file.

Voice Transmission #

Audio waveform data can be streamed via a socket and played back with lip-sync. This works even when not in tracking control mode. The audio data should be in raw PCM (16kHz, 16bit, mono format, other format is not acceptable), or you can use Opus encoder to transmit 48kHz sampling speech (ver.2024.7.24 and later).

There are two modes of voice transmission: File mode and Streaming mode.

In File mode, the audio data is sent in short chunks as described above (it’s also possible to send the entire audio in one large chunk), and a speech end signal is sent at the end. MMDAgent-EX starts playing the audio upon receiving the first chunk, outputs the transmitted audio data. When end-of-utterance message arrives, it ends the session and closes the lip-sync mouth.

In Streaming mode, audio transmission needs to be done in short chunks. Since no explicit speech end is given, MMDAgent-EX detects speech intervals from the detection of silent parts.

The default is Streaming mode. To switch to File mode, first send SNDFILE to change the mode, then sequentially transfer the contents of the file with SND. When you have finished transmitting the end of the file, send SNDBRKS to inform MMDAgent-EX that the input has ended. To switch back to Streaming mode, send SNDSTRM.

SNDxxxx(body) #

Transmit audio data. The xxxx is a 4-digit number representing the byte length of the data body in decimal, following the header. To avoid delays, it’s better to send the audio in short segments of about 40 ms (1280 Bytes) rather than sending long audio all at once. This message does not require last \n.

Python example of sending an audio chunk:

async def send_audio(chunk_data, chunk_bytes):
    header = ("SND" + f"{chunk_bytes:04}").encode('ascii')
    payload = bytearray()
    payload = header + chunk_data
    await websocket.send(payload)

SNDFILE #

Switch to File mode.

  • MMDAgent-EX play the sent audio as is, not try to detect voice part.
  • Required to send SNDBRKS after an utterance has been sent.
SNDFILE

SNDBRKS #

Send end-of-utterance for File mode.

SNDBRKS

SNDSTRM #

Switch to Streaming mode.

  • MMDAgent-EX detects voice part and only plays the part
SNDSTRM

SNDOPUS #

This 7-letter message tells MMDAgent-EX to switch to Opus mode. In opus mode, MMDAgent-EX accepts Opus-encoded audio packets of 48kHz sampling audio. Sampling rate must be 48k, and channel should be monoral. The encoded speech data can be transmitted just as same as normal PCM, using ‘SND’. The following is an example of voice transmitter using Opus mode in python with PyOGG module.

from pyogg import OpusEncoder

# prepare encoder
encoder = OpusEncoder()
encoder.set_application("voip")
encoder.set_sampling_frequency(48000)
encoder.set_channels(1)

# tell MMDAgent-EX to switch to OPUS mode
await websocket.send("SNDOPUS\n")

...

async def send_audio_opus(chunk_data, chunk_bytes):
    encoded_data = encoder.encode(chunk_data)
    header = ("SND" + f"{len(encoded_data):04}").encode('ascii')
    payload = bytearray()
    payload = header + encoded_data
    await websocket.send(payload)

SNDNPCM #

Disable Opus mode and go back to normal mode.

__AVCONF_DISABLEAUTOLIP,{NO|ARKIT|AU|ARKIT+AU|ALWAYS} #

When using facial tracking and voice transmission together, the mouth shape calculated by automatic lip-sync sometimes conflict with the mouth shape specified by facial tracking. __AVCONF_DISABLEAUTOLIP is a message to set the handling of automatic lip-sync during these conflicts.

One of the following options can be specified. If not specified, the default is NO.

  • NO: Always apply lip-sync. In case of a conflict, the mouth shapes from both lip-sync and facial tracking are added together and displayed.
  • ARKIT: Stops the automatic lip-sync while receiving __AV_ARKIT messages.
  • AU: Stops the automatic lip-sync while receiving __AV_AU messages.
  • ARKIT+AU: Stops the automatic lip-sync while receiving either __AV_ARKIT or __AV_AU messages.
  • ALWAYS: Turns off the automatic lip-sync feature totally, do nothing.