|Home | Articles | Forum | Glossary | Books|
High quality compressed speech data is easily recorded and replayed on a BBC microcomputer using this speech digitization unit.
by A.L. EVANS AND J. FENNER
OKI SEMICONDUCTOR DEVICES
ADPCM speech chips are available from stock at Manhattan Skyline of Maidenhead, who are sole UK distributors for OKI semiconductor parts. A chip set comprising MSM5218RS, MSM5204RS, ALP4B (two ) and 384kHz ceramic resonator are available to readers the special price of £ (x1.34 for $) 16.70--half the normal one-off price. Manhattan Skyline Ltd are at Manhattan House, Bridge Road, Maidenhead, Berks SL16 8DB.
The sequential nature of tape recorders limits the therapeutic and educational techniques that can be utilized. A random-access tape recorder would allow a variety of approaches to be tried, though rapid access of sequential tape/cassette recordings is expensive.' A useful system can be implemented by using a BBC B microcomputer and storing the speech data on floppy discs, though the impracticality of storing huge files on disc makes some form of data compression desirable. It can also be used to record a.d.p.c.m. data for prom-based synthesizers.
The system components are shown in Fig.1, in which adaptive differential pulse code modulation, a waveform coding technique, compresses the data by factors of two to four.
All units are standard, readily available items apart from the speech digitization unit, the design of which is described in this article.
Data is exchanged between the digitization unit and computer memory via the 1MHz bus. Power is supplied via the 5V line in the analog port, leaving the auxiliary power socket free for the disc drive. The system is controlled by means of a touch pad (concept keyboard) which allows the recording or playback of a large number of files of data, limited only by the disc filing system (31 for the Acorn DFS). Each file can hold as much data as is allowed by the computer 'memory, the data capacity of the disc is soon approached unless an 80-track drive is used.
In practice, a useful system has been implemented with the concept keyboard allowing a choice of nine files, each containing the speech data corresponding to a word or a short phrase. A small area of the touch pad is reserved for controlling the record or play-back mode-a prompt appears on the screen so that the user is aware of the currently selected mode. Writing the data file to disc takes one to two seconds and if the selected work or phrase is not currently in memory, there is a similar delay while data is read from the disc before speech starts.
SPEECH DIGITIZATION UNIT
Direct digitization of speech waveforms at 8 kHz only allows some three seconds' worth of data to be acquired in the 32K memory of the standard BBC B microcomputer (allowing a modest 8K for program storage). To gain more recording time, the adaptive differential p.c.m. technique implemented by the OKI company in their MSM5218 chip is used to give a data compression factor of two. In addition, the system allows software selection of sampling frequency, implementation of analysis and synthesis on the same chip, easy interfacing to an input/ output port, and low-power c-mos circuitry.
In the digitization unit, Fig.2, the audio input from the microphone is buffered, filtered and amplified before entering the eight-bit analog-to-digital converter (MSM5204). The data converter has a built in sample and hold function, and is control led by the start-conversion signal of the analysis/synthesis chip. Data is transferred to a parallel-to-serial converter (4014) before being presented to the 5218 as eight-bit serial data. Since the 5218 expects a 12-bit data input, the least significant four bits are padded out as additional zeros using hard wired circuitry (4024 and 4011 on the full circuit diagram in Fig.3). The analyzed data is transferred to memory from the data pins Do-D3 of the 5218. Figure 4 shows the timing diagrams for the data transfer to and from the microcomputer via the 6821 peripheral interface adapter. Port A of the 6821 p.i.a. is reserved for bidirectional data transfer while port B dedicated to control the 5218 chip functions. The master timing signal VcK from the 5218 is detected via CBI pin of the p.i.a. The p.i.a. was addressed by the computer at &FCEX using minimal decoding logic (Fig.3).
Data returns from the computer again via the 6821 p.i.a. The 5218 is placed into synthesis mode by taking pin 6 low (port B bit 3). The VcK signal from the 5218 still controls the data transfer (Fig.4). An analog output is produced at pin 18 and after low-pass filtering the signal enters an audio amplifier before the speech is generated by an external speaker.
The components used in the speech digitization unit are readily available (OKI chips, active filters and resonator from Manhattan Skyline, Bridge Road, Maidenhead, Berks SL6 8DP). Component costs, including interconnecting plugs, cable and the enclosure, are below £ (x1.34 for $) 55.
The quality of the speech produced is good, its intelligibility being close to high quality recorded speech and significantly better than l.p.c. speech in a similar environment.
The programs are written in Basic with machine code routines controlling the acquisition and playback of data. The main program is described by the flow diagram of Fig.5. The system waits for a touch-pad key to be selected and then branches to the routines controlling the data transfer.
If recording of speech data is selected the acquired routine is entered. All interrupts are disabled so that data acquisition occurs at a predictable rate. After the p.i.a. has been initialized, the system is synchronized with the VcK signal from the digitization unit and data acquired until the recording is terminated by the touch pad. When the allocated section of memory is full, the data starts overwriting the same section so that using a pointer allows the most recent six seconds of speech to be retained. When acquisition is terminated, the program adds a stop code, re-enables interrupts and returns to Basic so that the data may be transferred to disc.
In a similar way, when playback is selected, the program initially decides whether the desired speech is already in memory and if necessary reads the data in from a disc file. Again, all interrupts are disabled before the system initializes the p.i.a. for transfer of data to the digitization unit. The system detects the VcK signal from the 5218 and transfers the data according to the timing diagram of Fig.4. When the stop code is detected, interrupts are re-enabled and the system returns to the main program.
The system's versatility has been extended by providing an overlay editor which allows for example, therapist to design an overlay on the touch pad (see below). Thus the user has control of the number and size of sensitive elements on the touch pad. The overlay editor can be entered at the beginning of the main program but the program defaults to using the current overlay after a suitable time delay.
Software. Listings of the program controlling the acquisition and replay of the speech date, and for the overlay editor, together with the p.c. prototype layout are available from the editorial office in return for a self-addressed A4 envelope, marked 'phrase recorder'.
Aled Evans and John Fenner are medical physicists in the West of Scotland Health Board's' Department of Clinical Physics and Bioengineering in Glasgow.
OKI Semiconductor Application Note. Simultaneous speech analysis and synthesis with MSM5218, 1982.
1. Thomas, A. The random access tape recorder system. Proc. 4th Annual Conference on Rehabilitation Engineering, Washington D.C. 1981.
2. Keating, D., Evans, A.L., Wyper, D.J. and Cunningham, E. Comparison of the intelligibility of some low-cost speech synthesis devices. British Journal of Disorders of Communication (vol.21), 1986, pp 167-172.
Also see: Versatile operational amplifier
(adapted from: Wireless World , Jan. 1987)