The project scope is to develop a system designed to acquire, process and retrieve medical observations from and to the patient clinical observation sheet.

The system is designed to help the physician in his daily patient visits to fill his observations in the observation sheet. Our approach is to use a recording device, and the most frequent found recording device is nowadays the mobile phone (smart phone), to record the physician observations and to transform the recorded voice in a text file which later will be added to the observation sheet, next to the bed of the patient.

The architecture of the system,  is composed by the following components:

  • A mobile phone (smart phone) installed with iOS or Android
  • A local server, offering connectivity services with a cloud voice processing system
  • A speech to text SaaS (Software as a Service) cloud service offered by Vocapia

When the physician has the visiting program, he will start on his mobile phone an application which will connect on the local server and the physician will authenticate using a username and a password using the Kerberos protocol. The connection is local using the WiFi Hot Spots in the hospital.

After the authentication the physician can start his visits. On the observation sheet of each patient a QR code and an identification code are printed. The QR code (Quick Response Code) contains the same code and the patient name, and can be used to identify the patient.

When the physician wants to add his observations he will scan the QR code or type the code in the application and then he can press record on the application and he can start to speak on the phone microphone. At any time the recording can be restarted or placed on pause (the pause is the default behavior if the application is placed in background, for example if the phone is ringing or another application is used). After finishing the recording, the physician can press the send button and then he can go to the next patient.

The generated mp3 file along with the identification number is sent to the server which stores the recording in an archive along with other information (metadata) like the author of the recording, and then the mp3 file is also sent to a Speech to Text SaaS service on an external cloud, the file is tagged with the ID of the patient. The SaaS service returns the text associated with the mp3 file in an xml file which tags each word with the timestamp from the audio file, this allow a fast correction if it is the case (this can be done by the physician or by other authorized users using a web application ).

The phone application will be  build for iOS and Android. The application will have the following functions:

  • Allows the user to authenticate using the Kerberos protocol
  • Scans the QR code (or allows to enter the patient code (patient ID) using the phone keyboard) in order to identify the patient
  • Records the observations of the physician in mp3 format and sends to the server the file tagged with the patient ID

The application is not storing the credentials used for authentication or any information about the patient; moreover the mp3 file is deleted after the server confirms that he has received the message.

The local server is responsible for the following services:

  1. Authentication of the physicians which are using phones in order to add observations to the patient observation sheet.
  2. The server receives the mp3 files sent by the mobile application and stores them on a database (along with the other patient data) without alteration; this file is used later for checking the text conversion of the audio file but also can be used as a proof in cases of malpractice investigations, the file don’t need to be encrypted because the patient personal information is not contained in it.
  3. The server connects to the SaaS service in cloud using SSL Encryption, and using REST API is generating a Speech to Text request for the mp3 file which was received from the mobile application, the same service receives the xml file from the SaaS service when the data is ready and places this file into the same database.
  4. A web application on the server allows users to view/edit the patient observation sheet based on the privileges associated with each user. The application integrates forms for displaying the observation sheet, a text editor for the transcript and an mp3 player, and can be accessed using an SSL Encrypted connection (and this can be done only from the hospital). The physician which has done the observations on the observation sheet using the phone application or other authorized user can edit the speech to text xml file, each change in the file being tracked and stored in the database (a change history is stored along with the file). This editing feature is very useful because the speech to text system is not 100% error free, some words being converted wrongly and this need to be corrected.

Another feature of the web application is that is using also the timestamps from the xml file, each word from the transcript being tagged with a timestamp; this allows a fast editing/correction of the transcript. If in the application a certain word from the transcript is selected then the mp3 player integrated in the application is playing the file from the time associated with the selected word, the user being able to go directly to the sequence where the selected word has been pronounced, without paying the entire file.

Each service offered by the server is implemented using four distinct applications (agents) – Fig. 1. The database will be implemented using IBM DB2 Express-C because of the native support of pureXML, XML file being stored in pages in the database offering a superior performance.

Fig. 1.  The connectivity between the offered services.