Why did the TI-99/4 have two databuses?

Why did the TI-99/4 have two databuses?



Wikipedia says on Russian page:



Однако к 16-разрядной шине были подключены только 256 байт статической памяти и системное ПЗУ. Остальная память (ОЗУ) и периферийные устройства были 8-разрядными и подключались через мультиплексор, что требовало удвоения числа циклов обращения к ним.



which means



So anyway the 16-bit bus was connected to only 256 bytes of static RAM, and the system's ROM. The rest of the memory and peripherals were 8-bit and were connected through a multiplexor, which means it takes twice as many cycles to access.



and on its German page



Auf diese Weise können alle 8-Bit-Systemkomponenten wie Grafikchip, Soundchip oder GROM-Chips von der CPU mit der entsprechenden Wortbreite angesteuert werden.



which means



In this way, all the 8-bit components such as the graphics chip, sound chip, and GROM chips may be controlled by the CPU with the corresponding wordsize.



Now given that the TMS9900 has a 16 bit databus, it seems to me that they could have put all memory and all periphery on that bus. It would have saved the cost of the multiplexor, and made the computer a good deal faster too. Presumably it would have simplified the routing of the circuit board also.



If some hardware register is only 8 bits wide, it's not a problem to ignore the upper 8 bits, is it? Writing - just ignore the bits. Reading - just let the line float. In this way, it would behave the same way as the upper bits of the Color RAM of the Commodore 64, which is 4-bit memory connected to an 8-bit bus.



So my question is why the TI-99/4 had these two databuses. Why not put everything on a single databus.






"It would have saved the cost of the multiplexor" ... I can't find a schematic, but my suspicion is that the multiplexor is at most 4 or 5 74-series ICs which would have cost < $1 each, even at late 70s prices. And that's even if you ignored the fact that TI manufactured them themselves... the cost of redesigning would have been a lot more significant.

– Jules
Sep 13 '18 at 14:02






What Wikipedia article is the quoted information from?

– Steven M. Vascellaro
Sep 13 '18 at 15:41






Of course, it's worth noting that it actually has 3 data buses: the 16-bit bus, the 8-bit peripheral bus, and the 8-bit bus between the VDP and the video RAM.

– Jules
Sep 15 '18 at 18:56




4 Answers
4



According to Wikipedia,



The unusual architecture of the 99/4 series is documented to be due to the failure of the 9985, an 8-bit processor which was being created specifically for the machine. When it was abandoned, the 16-bit 9900 was selected to replace it, and a great deal of "glue logic" had to be added to fit the processor into the existing design, while no changes were made to take advantage of the 9900's strengths.



(Surprisingly for Wikipedia, especially since this “is documented”, there is no reference to supporting documentation.)



Apparently the TI-99/4 was designed as a 8-bit system, but when its intended 8-bit CPU failed to materialise, the 9900 was retro-fitted into it. The 9900’s architecture, in particular its reliance on external registers accessed via its 16-bit databus, meant that a 16-bit bus had to be added alongside the existing 8-bit bus.






Lets say it was inteded to use a 16 bit microcontroller with an 8 bit external memory expansion bus. Also, the 9900 wasn't retrofited, but during development a plug-in daughter board with a 9900 and external circuitry to emulate the memory protocoll of the microcontroller was made and placed instead of the microcontroller. When the deadline for the computer closed in, the microcontroller was still not finished, so they decided to go with integrating the daughter board for the first series. And that's the way it stayed.

– Raffzahn
Sep 13 '18 at 14:11






@Raffzahn, wow, that begs the question then, why the bizarre memory protocol?

– Stephen Kitt
Sep 13 '18 at 14:13






Stephen, it's rather simple when looking at the intended architecture, which is unusual from canonical point of view, but none the less a good one. I got to do a shoping run right now and write a better answer tonight. Ok?

– Raffzahn
Sep 13 '18 at 14:15






@Raffzahn oh there’s no rush ;-)

– Stephen Kitt
Sep 13 '18 at 14:22






@Raffzahn - I need milk, can you pick some up for me?

– Jeremy
Sep 14 '18 at 12:51



Now given that the TMS9900 has a 16 bit databus, it seems to me that they could have put all memory and all periphery on that bus. It would have saved the cost of the multiplexor, and made the computer a good deal faster too. Presumably it would have simplified the routing of the circuit board also.



Yes it would - if the computer had been intended to be built with a 9900 in the first place.



So my question is why the TI-99/4 had these two databuses. Why not put everything on a single databus.



TL;DR: The CPU operates a 16 bit bus, and the core components are on this bus. The 8 Bit bus is an extension or I/O bus - essentially what's accessible on the right side connector and the cartridge port. Think of this like 32 Bit Local Bus vs. 16 Bit ISA on a 486 PC.



For a thorough explanation this calls for a differentiated answer about a design and what happened.



The 99/4 was meant to be a low cost home computer design. The 9900 was, at that time, a top end 16 bit processor (*1) with great performance, but also a rather high price tag. 990 family CPU boards using a 9900 were priced close to and above 1000 USD. Building a home computer with a 9900 would have made a quite powerful, but also rather expensive machine - not least the fact that it would need 16 bit memory, making humongous (in 1977/78 (*2)) 32 KiB the basic RAM (*3). Similarly, ROMs must also offer 16 bit data, and so on. A machine like that would have been extremely expensive, and definitely not able to compete against other home computers - and consoles - at the same time.



But there was also an 8 bit version, the TMS9980. It's basically to the 9900 what the 8088 is to the 8086. The same CPU, just with an external 8 bit bus and a memory interface splitting each word access into two byte wide accesses. On the downside, it would (almost) halve the CPU speed. Not really what engineers like to design.



Without a closer look, this only leaves the choice between a great but expensive machine and a cheap(er) but quite slow one. Now, looking at the most common job for a home computer, running BASIC programs, reveals that most of the memory, where the BASIC code resides, is only accessed in 8 bit portions anyway. It also doesn't really matter if that access is slower than usual, as there will be many native instructions executed between each fetch of BASIC code. Furthermore, there are not many other RAM locations a BASIC interpreter (and an underlying BIOS) needs at all.



So the idea was born to use a version of the 9980 with a 16 bit wide on-board ROM for some BIOS and BASIC, some 16 bit wide RAM (256 Bytes (*4)) and an external 8 bit bus to access user RAM and other I/O components. The core system could run at full speed, while slower and less frequently used components were accessed using the 8 bit I/O bus (*5), thus enabling the use of less expensive standard 8 bit components.



The story could end here, but the TI engineers even went a step further and designed the whole machine around a streaming based access concept. External units that offered more than just a few port registers were supposed to offer a streaming access to their content. This means that after a start address (within the device) was set, each consecutive access should deliver the next byte - or take one when writing. Each of these streams were to be accessed by some (hard or dynamic) designated port, mapped into CPU address space, effectively enabling an almost unlimited amount of RAM or ROM. The latter being most important for game cartridges, as it was already obvious that limitation to 4 KiB as on the 2600 is a serious issue.



The streaming concept was also used for video and BASIC RAM. Instead of just mapping 16KiB plain into the main address space and having that shared with some CRT controller, the 9918 was designed to manage all RAM and offer streaming access to the host system. This not only saved the design of a separate DRAM controller, but also let BASIC store the program within video memory, separating it from any limitations of the CPU address space. In theory the VDP RAM could be extended way beyond 16 KiB (*6) without ever colliding with any other component. A 1 MiB 99/4 would just have required an additional address register ... and there was plenty of unused address space in which to put it.



A similar concept for streaming ROMs was devised, so called GROM. Without modification, a standard 99/4 could hold up to 16 external GROM with 40 KiB each (in default configuration). 640 KiB ROM for a 1977 machine does sound nice, doesn't it? (*7)



They even pushed it a level ahead and didn't put BASIC into the 16 bit ROM, but a simple basic OS plus a collection of routines, from memory and screen management to floating point ... and an interpreter for a general purpose virtual machine. Today we would call such a bytecode interpreter like the one used by Java - just here the 'language' was called Graphic Programming Language or GPL. Beside the usual computing stuff it consists of a mechanism called XML (*8) to include complex operations like floating point or graphics as basic 'machine' operations. Using such GPL programs in cartridge GROMs resulted in rather high execution due to its interpreted and optimized nature. In fact, even the BASIC interpreter itself was a GPL application.



Bottom line, this was a great design choice to create an extremely versatile system at low cost with incredible options to expand. It could have been great ... except for what happened.



When the basic design was done, three (somewhat) custom chips where needed.



During development VDP and Sound where prioritized due to their special features, while the 9985 could be substituted by a replacement circuit consisting of a board with a 9900, 16 bit ROM/RAM and multiplex logic to emulate the 8 bit Bus.



The multiplex logic was straightforward, turning each 9900 access into two 8 bit ones. This was different from the planned 9985 which would have made only a single 8 bit access if it was for a byte instead of a word. This wasn't a big deal except for the VDP. A useless full memory cycle for each and every byte accessed on screen was a real stopper for program development, in particular games. So the VDP got moved to the 16 bit side of the development setup, despite being an 8 bit device. (*9)



VDP and sound chips where delivered mostly on time, but even in early 1979, no 9985 was ready, while the machine was designated to premiere for Christmas 1979. So a decision was made to integrate the replacement circuit into the mainboard, use the full 9900 CPU plus the additional circuitry for a first batch, hoping that a few months later the 9985 would be ready.



Well, it wasn't. To make it worse, management canceled the 9985, as it didn't make much sense from the chip divisions point - and after all, the 99/4 did work 'fine' without, didn't it? So the 99/4 continued to be sold with a 9900. (*10) Instead, the 9940 and 9990 were produced.



When the 99/4A redesign came along, only minor changes (beside the 9918A) where made. They skipped the chance to increase the 256 Bytes RAM to at least 1 KiB, or improve the 8 bit bus connection. :(



*1 - this may seam strange to us, as we always associate the 9900 with the somewhat strange 99/4. But it was considered a serious competition for 68k and 8086. Not just because of similar performance (and being available very early - 1976), but more importantly due to the huge software library available, including mature operating systems. Not to mention the fully symmetric and straightforward instruction set.



After all, the 9900 was the single chip implementation of the successful 990 series minis. Much like the LSI-11 implemented the PDP-11 - except the 9900 was a single chip implementation. It was used in several workstations and low end 990 minis.



*2 - This is about design decisions, not when it got first sold



*3 - At the same time, a PET was delivered with 4/8 KiB, a TRS-80 with 4 KiB and even the Apple II was able to run with just 4 KiB. 32 KiB as minimum memory would have been out of proportion for a home computer. That was a lot, even for professional machines/workstations.



*4 - This isn't just coincidentally similar to the 6500's Zero Page. It's the very same idea to make certain common instructions faster. In case of the 6502 by shortening address encoding for ZP (and enabling additional complex addressing modes), and with full wide access in case of the 99/4.



*5 - Quite similar to 32 bit PC CPUs (386ff) using 16 (or even 8) bit ISA bus for I/O.



*6 - Later 9918 follow up chips (9938/58) used in MSX2/3 did extended the VRAM up to 192 KiB.



*7 - As usual, reality is cruel to us. While GROM can have any size between 1 Byte and 64 KiB (with two byte addresses - longer addresses could provide more), TI manufactured them only with 6 KiB. But at least, GROMs could be combined if not occupying the same base address. So the maximum external GROM data was limited to 6x5x16=480 KiB ... well, still not bad, but way down from the original possible 1 MiB.



*8 - :)) No, not eXtensible Markup Language, but eXtensible Machine Language. Effectively a way to turn certain library functions like floating point or such into single virtual machine operations.



*9 - The memory behaviour also became eventually the first speed-up mod for the 99/4 - but that's a different story.



*10 - This was also the reason for the quite unusual move to raise the price tag for the 99/4 by almost 20% - they had to make good for more than calculated cost of components - mainly the CPU.



*11 - The 9990 was available much later, and used in compatible machines. Not to be confused with Yamaha's projected enhanced, MSX3 compatible VDP.






"16 external GROM with 40 KiB each" ... the references I've seen suggest TI never actually manufactured a GROM chip that contained more than 8KiB, but I'd be happy if they turned out to be wrong. Where does this 40KiB figure come from?

– Jules
Sep 14 '18 at 8:20






@Jules that's the theoretical maximum for the (BI)OS. The amount was limited by only scanning for 16 different GROM addresses duiring initialisation, andd the use of a 16 bit GROM address with 24 KiB (3x8KiB) reserved for the internal GROMs. ANd yes, only 8 KiB types where produced, but GROMs could be made to answer to different 'base' addresses, making it possible to combine several forming a bigger virtual one.

– Raffzahn
Sep 14 '18 at 12:40






@jules Infact the whole issue is a bit more weired. For one, GROM could come in any size up to 64 KiB (with two byte address) but in reality, TI produced only 6 KiB chips. In theory still 10 of them could be joined within a 64 Ki address space (in one port), but TI only manufactured them for addresses that where divideable by 8 Ki. So only 8 where possible and 2 Ki address space got wasted. The 99/4 OS did scanup to 16 base addesses, so a total of 128 GROMs with 1MiB (well 768 KiB) was possible, but the 3 console GROMs answered to all bases, so only 16x5 GROM could be used. Still not so bad.

– Raffzahn
Sep 15 '18 at 20:53






"but also let BASIC store the program within video memory" - but doesn't that mean it's byte-at-a-time? That would seem like a rather major problem for RETURN and NEXT.

– Maury Markowitz
Sep 17 '18 at 21:16






@MauryMarkowitz Why? Reading of BASIC code is sequential. A repositioning, like needed for RETURN or NEXT, is done by loading the saved pointer (from wherever) and storign it into the access register. Not much different from copying it into a ZP pointer on a 6502. Or did I miss anything in your remark?

– Raffzahn
Sep 17 '18 at 22:18



If you look at the description of the databus multiplexer e.g. here



The TMS9900 being a 16-bit processor, it has 16 data lines and 15 address lines. However, only the console GROMs (>0000-1FFF) and the scratch-pad RAM (>8300-83FF) are accessed in this way. Peripheral memory in the range >2000-7FFF and >A000-FFFF is accessed via an 8-bit data bus, and a 16-bit address bus. A small circuit in the console multiplexes the data bus, creates the 16th address line (A15) and puts the TMS9900 on hold while the least significant byte (the one at the odd address) is processed.



The VDP is hooked on the 16-bit bus, but only to lines D0-D7, i.e. it accesses the most significant byte only. The GROMs, the sound chip, and the speech synthesizer are hooked to the multiplexed 8-bit bus. However, their selection logic senses A15, so they only react to even addresses. In both cases the multiplexer (uselessly) puts the CPU on hold, since all these devices map in the range >8400-9FFF.



you can see that the VDP is connected in exactly that way.



However, it would have hurt too much to throw away half of the peripheral memory (a 64KB address range isn't that much in the first place).



The multiplexer consists of a few chips, and isn't that expensive.



As for why the peripheral memory doesn't have 16 bit databus: I don't know. Possibly they wanted to keep the peripherials more simple (at the cost of making the main board a bit more complex).



Having an 8-bit databus instead of a 16-bit databus for the periphery has actually worse consequences:



[...] the data bus has to be multiplexed. This is achieved by a small logic circuit inside the TI console: any memory access to the range >2000-7FFF and >A000-FFFF is multiplexed. The least significant byte (odd address, D8-D15) is passed first, and this is signaled by a high level on the additional address line A15. For input operations, this byte is latched into the console by a 74LS373 D-type latch. Then the multiplexer puts the CPU on hold (with the READY line) and places the most significant byte on the data bus (even address, D0-D7), which is signaled by a low level on A15. The operation will only be completed after 4 clock cycles on Phi3*, by releasing the block on the TMS9900.



Concretely, this means several drawbacks:



So the "an 8-bit processor was replaced by an 16-bit processor" explanation from wikipedia mentioned in another answer sounds very plausible.






"However, it would have hurt too much to throw away half of the peripheral memory (a 64KB address range isn't that much in the first place)." ... half of the memory space is wasted anyway. Whenever the CPU generates an address (15-bit address bus) the multiplexer hits both the odd and the even address (generating both values of the least significant bit, which for some reason TI decided to call A15). The peripherals all respond only when A15=0, but if you connected another which responded when A15=1 it would be triggered whenever the original was used, which isn't desirable. Besides ...

– Jules
Sep 13 '18 at 13:22






... TI had a great solution to address space issues in terms of their GROM/GRAM system, which is effectively a serially-accessed ROM or RAM, containing up to 8KiB of data IIRC (theoretical limit 64KiB, but TI never manufactured a chip that large), that occupies only 8 addresses on the CPU memory map. The entire system was built to work around this, the CPU can execute code from them, and so on. The TI99/4A was designed to allow 16 GR*Ms to be connected, it was capable of supporting 128KiB of ROM/RAM in addition to the CPU-bus based expansions (up to 32.25KiB) and VDP RAM (16KiB) = ~160KiB.

– Jules
Sep 13 '18 at 13:28






How hard would it have been to say that accesses to addresses where A15:A14==10 would be processed as 16-bit bus accesses, those with A15:A14=0x would be treated as 8-bit addresses on the 8-bit bus, and those with A15:A14=11 would be treated as a pair of 8-bit accesses? Being unable to perform an isolated 8-bit operation seems unnecessarily confining from both a semantic and performance standpoint.

– supercat
Sep 13 '18 at 15:23






@supercat - note that as per Raffzahn's answer, the external bus multiplexer was supposed to be a temporary solution to allow development of the design to continue while the 9985 processor was being developed; in the 9985 it would have been integrated with the processor and therefore know whether the instruction being executed was an 8 or 16 bit one. It only became a permanent solution when they came to start production and the 9985 still wasn't available, by which time it was far too late to redesign.

– Jules
Sep 14 '18 at 7:50







@Jules - I'm new to the TI, so these GROMs, was it basically a ROM that was accessed like a 8-byte-at-a-time tape? Was this automated or did you have to change the address of the window in code? And what was GRAM used for? Just the scratch? I assume main RAM was random access?!

– Maury Markowitz
Sep 17 '18 at 21:10



So my question is why the TI-99/4 had these two databuses. Why not put everything on a single databus?



The TMS9900 CPU only has three on-chip registers, none of them being so much as an accumulator. I.e., every mathematical and logical operation occurs in RAM registers. From Wikipedia:



Only the Program Counter, Status Register, and Workspace Pointer registers are on the chip; all work registers are kept in RAM at an address indicated by the Workspace Pointer. 16 registers are available at any given time, and a context switch instruction which changed to another workspace automatically allows fast context switches compared to other processors which may have had to store and restore the registers. For CPU RAM, the machine has only 256 bytes of "scratchpad" memory to support the storage of workspaces. This memory is placed directly on the 16-bit bus with zero wait states, making it much faster than any other memory available to the system.



The professional versions of the TMS9900 family had on-chip cache which handled the registers. The home version instead used 256 bytes of "scratchpad RAM" to emulate cache. While faster than other RAM on the system, the effectively registerless design led to one of the slowest 16-bit computers ever. Proponents say that this design allowed for extremely quick context switches. True. But once in a context (i.e. actually doing something), all operations were off-chip.






so you reckon this entire contraption was just to provide a little double-speed memory? But then My question is essentially "Why not wire all the memory to that faster bus".

– Wilson
Sep 13 '18 at 15:40






@Wilson, yes, although "just to provide a little double-speed memory" seems to be missing the point. There are no work registers on the chip! Nor is there a RAM cache. To make up for those flaws they came up with the kludge of scratchpad RAM. Nothing can be accomplished, not even a simple add of two virtual registers, without accessing memory. Every operation involves not just CPU time but double bus access time. Horrible design, IMO! // Making all memory as fast as scratchpad RAM would have been expensive.

– RichF
Sep 13 '18 at 16:46






@RichF The low speed comes not so much from a memory to memory architecture, but rather the comparable low clock frequency. In the 99/4 the CPU was operating at only 3 MHz. This is way below what the CPU could do (10MHz) and even below what contemporary RAM (like used in an Apple) would have enabled (~8 MHz).

– Raffzahn
Sep 14 '18 at 0:12






@Raffzahn Comparing specs at the time would have led one to believe that a 16-bit TI 99/4 running at 3 MHz would be significantly faster than the 8-bit processors of the C64 and Apple II running at 1 MHz. Yet I know from experience that the C64 could run circles around the TI 99/4, with an oval thrown in once in a while just for good measure. ;) I'm sure the reasons for this went beyond the registerless design, but seeing that no other popular CPU has ever chosen this route, I'll stand be the idea that it was a subpar design, especially with no on-chip cache memory.

– RichF
Sep 14 '18 at 6:50






@RichF - part of the problem was the badly designed built-in BASIC interpreter, which was extremely slow and limited. With the Extended BASIC cartridge, the machine was much more capable. And then the fact that there was only 256 bytes of RAM actually attached to the bus, so almost all memory access had to be done via the display processor, which meant 3 bus accesses (= 42 processor cycles, because a MOVB instruction takes 14 cycles) were required for every access to main memory. Also, with average instruction time being about 13 cycles, a 3MHz TMS9900 is pretty similar to a 1MHz 6502.

– Jules
Sep 14 '18 at 7:07



MOVB



Thanks for contributing an answer to Retrocomputing Stack Exchange!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)