Monday, October 15, 2018

Telemetry as part of the Dynamics 365 Finance and Operations life-cycle

How much telemetry are we collecting?

A lot, like, really a lot!

That includes kernel level information, like an online user session requesting web access to a particular AOS to perform business operations through the web UI, a web service request to handle a mobile device operation, an OData request for exporting or importing data, exceptions and other infolog messages displayed to the user, etc.

It also includes specific application-level information, like details about each step during an inventory update, or information about the different wave processing steps, the specific flow happening on the warehouse mobile device, etc.

And we keep adding to it to have more and more granular information about what exactly is happening in the system at any particular point in time.

How can I access all this telemetry?

You can find it under "View Raw Logs" under Environment Monitoring in LCS.
It's very well described in the following two articles, so please read through those at this point:
You can then use the different search options described above to query out the specific events you are interested in.

What about Warehouse - specific telemetry, say, Wave processing?

Depending on the version of the product you are running, it differs slightly. 

Here's how you would search for wave processing events on releases before Fall release of 2018:


Raw logs search criteria

Note The above query options is a preview feature, so is most probably not visible to you yet.

If you then wanted to filter on a specific wave, for example, you could add that to the search criteria as well. 
Note that due to compliance, all of the information is not exposed directly, but rather RecIds are used. So you'd need to retrieve the RecId of the wave you wanted to investigate.

Here's some of the fields you should take note of:
  • The TIMESTAMP column would tell you when exactly the event occurred
  • RoleInstance would show, which AOS the activity happened on. This could be useful when troubleshooting caching and other cross-AOS issues
  • ActivityId is a way for you to limit to only a specific smaller process (for example, a specific allocation thread), and dig deeper, for example, to analyze the slow queries that happened as part of that activity
  •  infoMessage would contain information about the actual wave step performed, as well as specific details about the step, like how many load lines were processed, how long it took, etc.
TaskName: WhsPerformanceTaskStop, ruleName: waveProcessing, actionPerformed: runWaveStep, durationInMilliSeconds: 0, details: {"allocatedLoadLines":"500","custom":"no","waveId":"5637815827","waveStep":"WhsPostEngineBase.allocateWave"}

From Fall release onward

You could search for WHSPerformanceTaskStart/Stop directly in the TaskName instead of as part of infoMessage. The rest still applies.

What about mobile device telemetry - do we capture that?

Of course we do :)

Just search for TaskName == WhsUserActivityEvent
This event contains the same basic information as mentioned above, as well as stuff specific to the mobile flow:
  • Company, Site and Warehouse, where the warehouse user is operating. (RecIds)
  • WorkExecuteMode and step for the specific flow. You'll need to lookup the actual step in code
  • GUID of the mobile device user session
  • WorkTransType, as well as WorkTable and WorkLine RecIds being processed
  • RequestXML - this is the actual request, except all the data and labels are scrubbed, so it shows just what controls are shown on the screen
This event, if you search for it in code, is invoked at the very end of processing the user input from the mobile app.

You can correlate these events with those of TaskName == RequestContext, where the url of the request contains "/api/services/WHSMobileAppServices/WHSMobileAppService/getNextFormHandHeld" - this is the actual web service call as it arrives at the AOS, so the point in time when the AOS starts handling the mobile app flow step. 

Everything that happens in between with the same ActivityID is part of the flow, whether that is slow queries, error messages or other relevant events.

I don't see these events - what should I do?

We have back-ported a lot of the telemetry through hotfixes. Search for it on LCS for your specific release. 
But I would also like to take this opportunity and move up to the latest release. There is soo much goodness in there!

Should partners be adding telemetry?

ABSOLUTELY!
  • If you are a client, I suggest that you start insisting your partners adds telemetry with any new code the push into production.
  • If you are a partner, I suggest you start adding it asap - you'll save yourself a lot of time going forward, if the customer has an issue with your code. Since you cannot just debug in production any longer, you'll need to rely much more on alternative ways of telling you what exactly happened. Telemetry is the solution.

Where do we start? Do we have some examples?

We are working on the guidance for this and will share this out very soon.


Yes, all of the application telemetry is added directly as part of the application code, so should not be too difficult to find.
I suggest that you rely on the same event "InfoLogMark" as shown below as well.
Consider adding a using statement to reduce the invocation.

Microsoft.Dynamics.ApplicationPlatform.XppServices.Instrumentation.XppRuntimeEventSource::EventWriteInfoLogMark(
                                      Exception::Info, strFmt('TaskName: , etc.', );

What else should be captured out of the box that is critical for you?

Let me know in the comments!
Thanks

1 comment:

  1. Hi Ivan,
    Has there been any update to the example guide referenced in this post? Hoping to gain some insight on adding telemetry info to custom code. Thanks in advance.

    ReplyDelete

Please don't forget to leave your contact details if you expect a reply from me. Thank you