# **MIPS® Toolchain Specifics** Document Number: MD00624 Revision 01.53 June 01, 2009 MIPS Technologies, Inc. 955 East Arques Avenue Sunnyvale, CA 94085-4521 Copyright © 2008,2009 MIPS Technologies Inc. All rights reserved. Copyright © 2008,2009 MIPS Technologies, Inc. All rights reserved. Unpublished rights (if any) reserved under the copyright laws of the United States of America and other countries. This document contains information that is proprietary to MIPS Technologies, Inc. ("MIPS Technologies"). Any copying, reproducing, modifying or use of this information (in whole or in part) that is not expressly permitted in writing by MIPS Technologies or an authorized third party is strictly prohibited. At a minimum, this information is protected under unfair competition and copyright laws. Violations thereof may result in criminal penalties and fines. Any document provided in source format (i.e., in a modifiable form such as in FrameMaker or Microsoft Word format) is subject to use and distribution restrictions that are independent of and supplemental to any and all confidentiality restrictions. UNDER NO CIRCUMSTANCES MAY A DOCUMENT PROVIDED IN SOURCE FORMAT BE DISTRIBUTED TO A THIRD PARTY IN SOURCE FORMAT WITHOUT THE EXPRESS WRITTEN PERMISSION OF MIPS TECHNOLOGIES, INC. MIPS Technologies reserves the right to change the information contained in this document to improve function, design or otherwise. MIPS Technologies does not assume any liability arising out of the application or use of this information, or of any error or omission in such information. Any warranties, whether express, statutory, implied or otherwise, including but not limited to the implied warranties of merchantability or fitness for a particular purpose, are excluded. Except as expressly provided in any written license agreement from MIPS Technologies or an authorized third party, the furnishing of this document does not give recipient any license to any intellectual property rights, including any patent rights, that cover the information in this document. The information contained in this document shall not be exported, reexported, transferred, or released, directly or indirectly, in violation of the law of any country or international law, regulation, treaty, Executive Order, statute, amendments or supplements thereto. Should a conflict arise regarding the export, reexport, transfer, or release of the information contained in this document, the laws of the United States of America shall be the governing law. The information contained in this document constitutes one or more of the following: commercial computer software, commercial computer software documentation or other commercial items. If the user of this information, or any related documentation of any kind, including related technical data or manuals, is an agency, department, or other entity of the United States government ("Government"), the use, duplication, reproduction, release, modification, disclosure, or transfer of this information, or any related documentation of any kind, is restricted in accordance with Federal Acquisition Regulation 12.212 for civilian agencies and Defense Federal Acquisition Regulation Supplement 227.7202 for military agencies. The use of this information by the Government is further restricted in accordance with the terms of the license agreement(s) and/or applicable contract terms and conditions covering this information from MIPS Technologies or an authorized third party. MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPS-3D, MIPS16, MIPS16e, MIPS32, MIPS64, MIPS-Based, MIPSsim, MIPSpro, MIPS Technologies logo, MIPS-VERIFIED, MIPS-VERIFIED logo, 4K, 4Kc, 4Km, 4Kp, 4KE, 4KEc, 4KEm, 4KEp, 4KS, 4KSc, 4KSd, M4K, 5K, 5Kc, 5Kf, 24K, 24Kf, 24KE, 24KEc, 24KEf, 34K, 34Kc, 34Kf, 74K, 74Kc, 74Kf, 1004K, 1004Kc, 1004Kf, R3000, R4000, R5000, ASMACRO, Atlas, "At the core of the user experience.", BusBridge, Bus Navigator, CLAM, CorExtend, CoreFPGA, CoreLV, EC, FPGA View, FS2, FS2 FIRST SILICON SOLUTIONS logo, FS2 NAVIGATOR, HyperDebug, HyperJTAG, JALGO, Logic Navigator, Malta, MDMX, MED, MGB, OCI, PDtrace, the Pipeline, Pro Series, SEAD, SEAD-2, SmartMIPS, SOC-it, System Navigator, and YAMON are trademarks or registered trademarks of MIPS Technologies, Inc. in the United States and other countries. All other trademarks referred to herein are the property of their respective owners. Template: nB1.03, Built with tags: 2B # **Table of Contents** | Chapter 1: Introduction | 7 | |-----------------------------------------------------------|----| | 1.1: About this Document | 7 | | 1.2: Contents | 7 | | | | | Chapter 2: MIPS32Compiler Options | 9 | | 2.1: Architecture Flags | | | 2.1.1: Endianness Flags | | | 2.1.2: Instruction Set Flags | | | 2.1.3: CPU Flags | | | 2.1.3.1: Other CPU-specific Options | | | 2.2: Optimization Options | | | 2.2.1: Optimizing for Speed | | | 2.2.2: Optimizing for Size | | | 2.2.2.1: Code and Data Garbage Collection | | | 2.3: GP-relative Addressing | | | 2.4: Unaligned Data | | | 2.5: Software Floating Point | | | 2.6: MIPS16® ASE Support | | | 2.6.1: Global Variables and MIPS16 Code | | | 2.6.2: Global Register Variables | | | 2.6.3: Divide by Zero Checks (-mcheck-zero-division) | | | 2.6.4: Execute-only Code / Split I-D RAM | | | 2.6.5: Generating MIPS16® Code | | | 2.6.6: Sibling Call Optimization | | | 2.6.7: Main differences between MIPS16® and MIPS16e™ Code | | | 2.7: Predefined Preprocessor Macros | | | | | | Chapter 3: GDB Debugging with the MDI interface | 29 | | 3.1: MDI Debugging | | | 3.1.1: MDI Debugging with the MIPSsim™ Simulator | | | 3.1.1.1: Configuring the MIPSsim™ Simulator for GDB | | | 3.1.1.2: Selecting the MIPSsim™ CPU | | | 3.1.1.3: Building for a MIPSsim™ Target | | | 3.1.1.4: Downloading to a MIPSsim™ ROM Target | | | 3.1.1.5: Non-standard MIPSsim™ Configurations | | | 3.1.2: MDI Debugging with an EJTAG Probe | | | 3.1.2.1: Configuring your Probe for GDB | | | 3.1.2.2: Selecting the EJTAG CPU | | | 3.1.2.3: Building for an EJTAG-connected Target | | | 3.1.2.4: Resetting the CPU | | | 3.1.3: MDI Debugging Tips | | | 3.1.3.1: Command line arguments | | | 3.1.3.2: MDI Host File I/O | | | 3.1.3.3: MDI Variables and Commands | | | 3.1.3.4: MDI Troubleshooting | | | 3.2: Debugging with MIPS® MT ASE | | | 3.2.1: Debugging LLMT Applications | | | | | | 3.2.1.1: Thread Status | 43 | |------------------------------------------------------------|----| | 3.2.1.2: TC-specific Breakpoints | 43 | | 3.2.1.3: Thread-specific Commands | 44 | | 3.2.1.4: Resuming threaded execution | | | 3.2.2: Debugging Multiple VPEs | 45 | | 3.2.2.1: Multiple VPEs with FS2 probe | | | 3.2.2.2: Multiple VPEs on the MIPSsim™ Simulator | | | 3.2.3: Debugging AP/RP Applications | 49 | | 3.2.3.1: Using the SP Debugging Daemon | | | 3.2.3.2: AP/RP Debugging with EJTAG Probe | | | 3.2.4: Debugging SMVP/SMTC Programs | | | 3.2.4.1: SMVP/SMTC using MIPSsim® Simulator On MIPSsim | | | 3.2.4.2: SMVP/SMTC using FS2 Probe and Group Debugging | | | 3.3: Debugging with the GNU Simulator | | | 3.4: Remote Serial Port Debugging | | | 3.4.1: Serial Debugging with the YAMON™ Monitor | | | 3.4.1.1: YAMON™ Monitor - Serial Download | | | 3.4.1.2: YAMON Monitor - TFTP Download | | | 3.4.2: Serial Comms Fault Finding | | | 3.5: Debugging C++ | 56 | | Observing 4. Manuscal December 450 m | | | Chapter 4: Manual Downloading | | | 4.1: Evaluation Board Download | | | 4.2: PROM Programmer Download | | | 4.2.1: Other Techniques | 58 | | Chapter F. Berting on ISO / ANSI C Brown | E6 | | Chapter 5: Porting an ISO / ANSI C Program | | | 5.1. Common Problems when Converting to MIPS® Architecture | | | Chapter 6: MIPS® Architecture Intrinsics | 61 | | 6.1: Intrinsics for Byte Swapping | | | 6.2: Intrinsics for MIPS32® Architecture | | | 6.3: Intrinsics for MIPS32® Release 2 Architecture | | | 6.4: Intrinsics for MIPS64® Release 2 Architecture | | | 6.5: Intrinsics for CorExtendTM Extension | | | 6.6: Intrinsics for COP2 Extension | | | 6.7: Intrinsics for SmartMIPS® ASE | | | 6.8: Intrinsics for Paired-single/MIPS-3D® Architecture | 60 | | 6.9: Intrinsics for MIPS MT ASE | | | 6.10: Intrinsics for MIPS DSP ASE | | | 6.10.1: Vector Data Types | | | 6.10.2: Scalar data types | | | 6.10.3: Compiler Builtin Functions | | | 6.10.4: Compiler Builtins for Second Revision | | | 6.10.5: Intrinsics for Atomic R-M-W | | | 6.10.6: Intrinsics for Data Prefetch | | | | | | Chapter 7: CPU Management | 79 | | 7.1: Cache Maintenance | | | 7.2: TLB Maintenance | | | 7.3: Hardware Watchpoints | | | 7.4: System Coprocessor (CP0) Intrinsics | 83 | | Appendix B: Revision History | 97 | |----------------------------------------------------------------|----| | Appendix A: References | 95 | | | | | 7.7: IEEE-754 Floating Point Emulation Library | 92 | | 7.6.1: Coprocessor 1 Emulation | 92 | | 7.6: Floating Point Coprocessor (CP1) | 91 | | 7.5: Miscellaneous System Support | 91 | | 7.4.5: CP0 Registers of MIPS® MT ASE | | | 7.4.4: Shadow Sets of MIPS32®/MIPS64® Release 2 Architecture | | | 7.4.3: CP0 Registers of MIPS32®/MIPS64® Release 2 Architecture | | | 7.4.2: CP0 Registers of MIPS32®/MIPS64® Architecture | | | 7.4.1: Common CP0 Registers | 84 | # **List of Tables** | Table 2.1: List of -mtune= Names | 13 | |-------------------------------------------|----| | Table 2.2: MIPS Predefined Macros | 26 | | Table 3.1: MIPSsim Configuration Settings | | | Table 3.2: Host O/S Serial Port Devices | | | Table 7.1: Hardware Watchpoint Attributes | | | Table 7.2: Watchpoint Return Codes | | | Table 7.3: Register Access Intrinsics | | | | | # Introduction # 1.1 About this Document Throughout this document, the command prefix "mips-sde-elf-" is used (assuming that your target is bare-metal/ELF), for example mips-sde-elf-gdb. If your target is actually linux, the command prefix would actually be "mips-linux-gnu-". Items listed in square brackets, like [Sweet99], are references to other documents - see the References Appendix for the full description of these other documents. # 1.2 Contents This document deals with options that are specific to MIPS architecture targets when using the GNU tools such as gcc, gdb, etc. # **MIPS32Compiler Options** The "MIPS Options" section in the GCC manual lists those compiler options which are specific to MIPS-based processors. This chapter provides some additional information about these options, and how you might use them. NOTE: Throughout this document, the command prefix "mips-sde-elf-" is used (assuming that you're using the bare-metal/ELF), for example mips-sde-elf-gdb. If your target is actually linux, the command prefix would actually be "mips-linux-gnu-". # 2.1 Architecture Flags There are several flags which adjust the class of instructions generated by the compiler or assembler to match your particular CPU type. You can get more information about the architectural features and choices mentioned here in [Sweet99]. # 2.1.1 Endianness Flags The most fundamental architectural switch controls whether to generate big-endian or little-endian code. MIPS architecture processors may be configured either way, but the rest of the hardware usually determines which way your system must work. Software has to be compiled to match the way the CPU is configured, or it will fail every time you perform a sub-word load or store. It is possible to write bi-endian code by very careful assembler coding (e.g. by performing all data accesses as aligned word transfers), but this is likely to be required for only the first few instructions after a hardware reset, until you have configured the CPU and/or device endianness correctly. - **-EB** Generate code and data for a big-endian CPU. - **-EL** Generate code and data for a little-endian CPU. # 2.1.2 Instruction Set Flags The compiler supports all official and currently implemented 32- and 64-bit MIPS instruction set architectures (ISAs). But the compiler will only generate code compatible with the base MIPS32 ISA unless one of the following switches is used: #### -mips1 Issue instructions from the original MIPS I ISA. Compiler/assembler only - no library object files are provided. #### -mips2 Issue instructions from the mips II ISA (branch likely; square root; 64-bit floating point load/store; faster floating point truncate). Compiler/assembler only - no libraries are provided. #### -mips3 Issue instructions from the MIPS III ISA (64-bit instructions; 32 f.p. registers). Compiler/assembler only - no library object files are provided. #### -mips4 Issue instructions from the MIPS 64 ISA (floating point multiply-add/sub, indexed addressing, reciprocal, etc.). Compiler/assembler only - no library object files are provided. #### -mips32 The new, rationalised, 32-bit MIPS32 instruction set defined by MIPS Technologies in 1998/99. It's not really very different from **-mips2**, but it picks up the conditional move instructions and rationalises the integer multiply/accumulate instructions (which were formerly CPU-specific). The "branch likely" instructions are officially deprecated in MIPS32, but the compiler will still generate them when tuning for CPUs for which it knows they don't have an adverse performance impact. Compiler/assembler only - no library object files are provided. #### -mips32r2 The Release 2 update to MIPS32 adds a few new instructions. #### -mips64 MIPS Technologies' rationalised 64-bit MIPS64 instruction set, which is a superset of both **-mips4** (at the user level) and **mips32**. Compiler/assembler only - no library object files are provided. An update of the specifications added some useful new features to the MIPS32 ISAs in September 2002. Many of these features are for the OS only; but there are also a few new user-level instructions: • Bit-rotate: previous MIPS ISAs had only shifts. The compiler will make use of the hardware rotate instruction if your source code is written so as to perform the rotate in a single expression. For example: ``` unsigned int a, b, r; /* fixed rotate right by 8, or left by 24 */ b = (a >> 8) | (a << 24); /* variable rotate right */ b = (a >> r) | (a << (32 - r)); /* variable rotate left */ b = (a << r) | (a >> (32 - r)); ``` - Bit-field operations: single-instruction unsigned bitfield extract and insert instructions make for more efficiency when doing just that. Note that gcc treats bitfields as signed if you don't use an explicit unsigned type modifier use the -funsigned-bitfields option to change that behaviour. The compiler will sometimes use them when given simple and obvious mask and shift expressions. In cases where it doesn't you can use the explicit insert/extract intrinsics. - Byte-swap instructions: the new instructions wsbh, dsbh and dshd swap bytes within halfwords, or half-words within doublewords, in a register. So you can do a full 32-bit or 64-bit byte-swap in just two instructions. The compiler will not generate these instructions automatically, but you can access them via intrinsics. - Sign-extend instructions: bytes and 16-bit values can already be sign-extended automatically when loaded from memory; these new instructions improve code for data which is already in registers. #### **MIPS32Compiler Options** • 64-bit FPU: a MIPS 32 Release 2 CPU may be paired with a 64-bit FPU, and the extra 16 registers will be used by the compiler if you give it the **-mfp64** option. Once you've defined your base instruction set, there are a collection of "instruction set extensions" which you can enable: #### -mips16 Compile using the MIPS16 "ASE". Each MIPS16 instruction is only 16 bits in size, and although a compiler must use more MIPS16 instructions to compile a function than would be required with the MIPS32 ISA, it allows simple integer code to be compiled with a 30-40% saving in space. Use of this option is a decision with lots of consequences: see longer discussion in Section 2.6 "MIPS16® ASE Support" mips16-ase below. **Warning**: although the name "MIPS16" seems to fit in with "MIPS32" and "MIPS64", it really is something quite different. In fact, MIPS16 encodings are available for 64-bit instructions too. The MIPS16 ASE is not available on all CPUs. It also isn't possible to write a complete system using MIPS16 instructions, since some vital instructions (CPU control, floating point, etc) have no MIPS16 encoding. MIPS16 instructions will probably only ever be generated by compiled code, so you will only ever see assembler code when looking at disassemblies or compiler intermediate files. In assembler source files you'll see that assembler code must request generation of MIPS16 code using an explicit .set mips16 or .set mips16e directive; the command line option is not passed to the assembler by the mips-sde-elf-gcc front end. The MIPS16 ASE is only available on bare-metal/ELF targets. It is not available for the GNU/Linux target. #### -mips16e The MIPS16 ASE is an extension to the MIPS16 encodings, built on the basis of experience with some large programs and achieving a useful improvement in density with a few extra instructions. This variant is standard on MIPS32 CPUs; in fact, the combination of flags "-mips32-mips16" implies -mips16e. The MIPS16e ASE is only available on bare-metal/ELF targets. It is not available for the GNU/Linux target. #### -msmartmips This option is only valid if you've selected a MIPS32/MIPS64 instruction set, and SmartMIPS cores always implement MIPS16 too. It allows the toolchain to exploit the SmartMIPS extensions to the base MIPS32 ISA: in particular the indexed load (used with grateful thanks by the compiler) and enhanced multiplier instructions - the latter available only through assembler code or special C intrinsics. SmartMIPS CPUs also anticipate the bit-rotate instruction from MIPS 32 Release 2, as in **-mips32r2** above. #### -mpaired-single For the MIPS 32 Release 2 and MIPS64 ISAs only, where 32 x 64-bit floating point registers are enabled (i.e. **mfp64**), this flag enables use of the "paired single" SIMD floating point extension which provides instructions to do two single-precision (32-bit) floating point operations at once, keeping the operands in pairs within a 64-bit register. More details on this option can be found in the [Gcc] manual. #### -mips3d Enables the MIPS 3D ASE which includes additional paired-single instructions that are designed to improve the performance of 3D graphics operations. Implies-mpaired-single. #### -mdsp This option is only valid if you've also selected the MIPS 32 Release 2 instruction set. It tells the compiler to allow the use of the MIPS DSP ASE either automatically where possible, by using vector types, and by use of builtin intrinsics. The ASE is also enabled if the **-march=option** specifies one of these CPUs from the 24ke/34k/74k family. #### -mno-dsp Prevents the compiler from generating MIPS DSP instructions, even if the selected CPU architecture would support it. #### -mmt This option is only valid when you've selected the MIPS32 Release 2 instruction set. It has no direct effect on the compiler, but instructs the assembler to allow the MIPS MT ASE instructions. These instructions can be generated from C code by using the intrinsics. A CPU which supports a given ISA will happily run code compiled for the previous variants with which it's backwardly compatible: In practice the first criteria for choosing which level to go for is whether you want to use 64-bit integer data types, which are available only with **-mips64**. Once you've chosen the integer data width, you'll get small performance increments by choosing the most specialised (usually highest-numbered) instruction set which matches your CPU; you'll make your binary program more portable by using the lowest number. The MIPS32 instruction set (or its Release 2 variant) is usually best for applications which don't use 64-bit integer variables, and which don't use floating point heavily - even when you've got a 64-bit processor, since you don't waste data cache space storing unnecessary sign extensions. # 2.1.3 CPU Flags The target CPU type may be specified using the compiler's **-mtune=** option. This allows the compiler to optimize the scheduling of instructions to match your CPU's pipeline. If it is not specified, then the compiler picks the most generic CPU type which matches your requested instruction set (e.g. 4Kc® for **-mips32**), but this may generate sub-optimal code for faster CPUs. Specifying the CPU type also allows the compiler to make more intelligent choices about CPU-specific features, such as the optional presence of fast or slow multipliers. In addition to this, the **-march=** option may be used to specify the precise set of instructions and features provided by the target CPU. It also selects the pipeline scheduling parameters if **-mtune=** is not used explicitly. Table 2.1 List of -mtune= Names | -mtune= | -mips | | | Comments | | |------------------------------------|-------|------|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | | 32 | 32r2 | 64 | | | | 4km, 4kc | X | | | 32-bit synthesisable 4Kc and 4Km cores, with fast multiplier | | | 4kp | X | | | 32-bit synthesisable 4Kp core, with slow multiplier | | | 4kem, 4kec | X | X | | 32-bit synthesisable 4KEc and 4KEm cores, with fast multiplier | | | 4kep, m4k | X | X | | 32-bit synthesisable 4KEp and M4K cores, with slow multiplier | | | 4ksc | X | | | 32-bit synthesisable 4KSc core with SMARTMIPS <sup>TM</sup> ASE | | | 4ksd | X | X | | 32-bit synthesisable 4KSd core with SMARTMIPS <sup>TM</sup> ASE | | | 5kc, 5kf | X | | X | 64-bit synthesisable 5K core family; the 5Kf core has an FPU | | | 20kc | X | | X | 64-bit 20Kc hard core | | | 24kc, 24kf2_1, 24kf1_1 | X | X | | 32-bit synthesisable 24K core family; the 24kf2_1 option tunes for a double-precision FPU running at half the integer pipeline frequency while the 24kf1_1 tunes for a FPU running at the same frequencey as the integer pipeline. | | | 24kec, 24kef2_1,<br>24kef1_1, | X | X | | Enhanced version of the 24Kc and 24Kf cores, with additional features such as the MIPS DSP ASE. The 24kef2_1 option tunes for a double-precision FPU running at half the integer pipeline frequency while the 24kef1_1 tunes for a FPU running at the same frequencey as the integer pipeline. | | | 34kc, 34kf2_1,34kf1_1 | X | X | | 32-bit synthesisable 34K core family, which supports the MIPS MT and MIPS DSP ASEs. The 24kf2_1 option tunes for a double-precision FPU running at half the integer pipeline frequency while the 34kf1_1 tunes for a FPU running at the same frequencey as the integer pipeline. | | | 74kc, 74kf3_2,<br>74kf2_1, 74kf1_1 | X | X | | 32-bit synthesisable, superscalar 74K core family, which supports revision 2 of the MIPS DSP ASE. The 74kf2_1 option tunes for a double-precision FPU running at half the integer pipeline frequency while the 74kf1_1 tunes for a FPU running at the same frequency as the integer pipeline. The 74kf3_2 option tunes for a double-precision FPU running at 2/3 the integer pipeline frequency. | | NOTE: This release of the toolchain only supplies library object files which are compiled with the MIPS32R2 version of the ISA. ### 2.1.3.1 Other CPU-specific Options You can control some features at a still finer level where necessary: #### -mbranch-likely Enable "branch likely" instructions with **-mips32** and **-mips64**, even though these instructions are officially deprecated. #### -mno-branch-likely Don't use "branch likely" instructions. #### -mcheck-zero-division Generate code to check for integer divide overflow - range checking is disabled by default. #### -mno-check-zero-division Don't generate code to check for integer divide by zero - checking is the default, except with -mips16. #### -mhard-float Emit hardware floating point instructions - this is the default. #### -msoft-float Emit calls to a software floating point emulation library. #### -mno-float This option is treated by the compiler's code generator as equivalent to **-msoft-float**, i.e. any use of floating point will generate calls to emulation functions, however it also instructs *mips-sde-elf-gcc* to link your program with libraries which do not include those emulation functions (thus causing a linker warning if they are called) and which also omit all hidden floating point support code, such handling of floating point format codes in printf() and scanf(). This option is only available with the bare-metal/ELF toolchain. #### -mfp64 Emit hardware floating point instructions for a 64-bit FPU - this is the default for 64-bit ISAs, but can also be used in conjunction with **-mips32r2**, which allows a 64-bit FPU to be paired with a 32-bit integer ALU. # 2.2 Optimization Options The "Optimize Options" section in the GCC manual lists the various optimization techniques that are available; serious users should read that. But it's traditional to provide numeric options - the higher the number, the more optimization. You should never compile without at least -O (equivalent to -O1) unless you're debugging; GNU C's unoptimized code is really unoptimized. Serious application code will be compiled with at least -O2; higher numbers may make your code bigger, and the trade-offs are discussed below. With GCC each number adds more optimization techniques, while keeping all the options from the lower numbers. For a detailed list of which optimizations are enabled at each optimization level see the [Gcc] manual, but in summary: #### -00 Do not optimize. -0, -01 #### **MIPS32Compiler Options** Optimize by trying to reduce code size and execution time, but without performing any optimizations that take a great deal of compilation time. -02 Performs nearly all available optimizations that do not significantly increase code size. In particular the compiler does not perform loop unrolling or function inlining when you specify **-02**. Compared to the optimizer settings documented in the [Gcc] reference manual we also enable-**fweb** at **-02**. -03 As -O2 but also enables -finline-functions -frename-registers -funswitch-loops. -0s Optimize for size: a lot like **-02**, but with additional optimizations to reduce code size, and disabling **-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks**. # 2.2.1 Optimizing for Speed In our experience maximum performance is usually obtained by using the -O2 or -O3 optimization level. It depends on your application, because sometimes the increased code size due to -O3's loop unrolling can slow a program down, by increasing instruction cache thrashing. We suggest experimentation and using profile feedback to tune the loop unroller. There are many other compiler flags which allow you to control individual optimizations. Not all of them will do anything useful, but here are a few which do have some useful effect: #### -funit-at-a-time Enabled at-O2. Instructs the compiler to perform whole module (intra-module) optimization by completely parsing a source file before beginning optimization and code generation. In this way the compiler can use information about all of the functions in the module to make better inlining and optimization decisions. Furthermore this can perform "inter-module" optimization of your whole program, or a subset of it. This exposes many more optimization opportunities to the compiler, at the cost of greatly increased memory usage in the compiler and compilation time. This feature only works for C, and not yet C++. It requires changing your Makefiles so that instead of using individual commands to compile each module to object code, and then linking the object files together, like this: ``` mips-sde-elf-gcc -02 -c moda.c -o moda.o mips-sde-elf-gcc -02 -c modb.c -o modb.o mips-sde-elf-gcc -o prog moda.o modb.o ... -lc ... ``` ... you now compile a group of modules together with a single invocation of the compiler, like this: ``` mips-sde-elf-gcc -O2 -c moda.c modb.c -o all.o mips-sde-elf-gcc -o prog all.o ... -lc ... ``` #### --param inline-unit-growth= The automatic function inliner (enabled at **-O3**) can be fine-tuned to limit the total growth of a module due to inlining, as a percentage. For example **-param inline-unit-growth=5** limits the total code growth to approximately 5% - the default being 50%, which may be too high for some embedded applications. #### -ffast-math Switches on **-fno-math-errno**, **-funsafe-math-optimizations**, **-fno-trapping-math -ffinite-math-only** and **-fno-signaling-nans** allow the compiler to be much more ambitious when optimizing floating point arithmetic, but it can generate incorrect code if a program depends on an exact implementation of IEEE-754 specifications of precision, non-finites and exception handling. Many embedded applications won't care about this, and can safely enable these extra optimizations. #### -fprefetch-loop-arrays Enables automatic generation of additional instructions to prefetch data accessed sequentially within loops into the cache. On CPUs which implement the pref instruction, such as the 24K and 34K, this can increase performance when accessing large arrays. But since this adds extra instructions it may also reduce performance. You can instead use explicit directives where you know it matters. #### -funroll-loops Unroll loops whose number of iterations can be calculated at compile time, or at run time upon entry to the loop. It also turns on complete loop peeling (see below). This option makes code larger, and may or may not make it run faster. It is enabled automatically by **-fprofile-use**, i.e. when you tell the compiler to perform profile directed optimizations. Most of the loop optimizations can be further fine-tuned using **-param**, see the [Gcc] manual for more details. #### -fpeel-loops Peels loops when there is enough information that they do not roll much (e.g. from profile feedback). It also turns on complete loop peeling which completely removes loops which iterate a small constant number of times. This option is enabled automatically by **-fprofile-use**. #### -funswitch-loops Enabled at **-O3**. Moves loop invariant conditional tests out of the loop, and then duplicates the loop inside each branch of the conditional. For example: ``` for (i = 0; i < n; i++) { if (a < 0) arr[i]--; else arr[i]++; } would become: if (a < 0) { for (i = 0; i < n; i++) arr[i]--; } else { for (i = 0; i < n; i++) arr[i]++; }</pre> ``` ### -fprofile-generate Adds code to your program so that when run it will collect profile data which can then be used by **-fprofile-use**. This requires that your program has access to a file system where it can store the profile data. Your program will run slower with this extra profiling code, so don't use this option when generating your final executable. #### -fprofile-use Use the profile data generated by running a program compiled with **-fprofile-generate** to decide when optimizations which increase the size of a program are worthwhile. This also enables **-funroll-loops** - **fpeel-loops** - **ftracer-fprofile-values** and **-fvpt**. When using optimizations which increase code size we strongly recommend that you measure the effect of each option on your performance. # 2.2.2 Optimizing for Size Use the **-Os** flag to tell the compiler that your priority is to reduce code size. This is similar to **-O2**, but subtly alters optimization heuristics in the interests of making your code smaller. (Higher optimization levels can otherwise increase code size in order to achieve better performance). Further space savings can be made by the addition of some or all of the following flags. But here as elsewhere: if you're not quite sure what they do, don't use them. Most applications will do just fine with **-Os**. #### -finline-functions Inlining of very small functions can actually reduce code size, by removing the function call overhead. This is now enabled by default by -Os, with the following additional parameters implied: #### --param inline-unit-growth=0 Limits the total growth of a module due to inlining to approximately 0% - the default for speed is 50%. #### --param max-inline-insns-auto=5 Sets the maximum size of function (in internal gcc instructions) which will be considered for automatic function inlining to 5 instructions, - the default for speed is 120 instructions. #### --param max-inline-insns-single=5 Similar, but applies to functions declared with an explicit inline and to C++ class methods - the default for speed is 500 instructions. #### -fmerge-all-constants Used in addition to the default **-fmerge-constants** this enables merging of *const* variables, as well as constant strings and literals. Languages like C and C++ require that each non-automatic variable has a unique address, so using this will result in non-conformant behavior - you will need to check that your program can survive this. #### -mno-check-zero-division Prevents the normal insertion of inline code which checks for integer divide-by-zero etc.; this won't affect performance, but it is not recommended when debugging your program. This is the default when compiling for MIPS16. #### -fno-rtti For C++ programs which do not use *dynamic cast* and *typeid*, use this option to disable generation of C++ runtime type identification information for every class with virtual functions. This can reduce the size of the code and data. #### -fno-exceptions For C++ programs which do not use exceptions, use this option to disable the generation of the frame unwind information - which will significantly reduce the read-only data size. #### -ffunction-sections Causes each function to be emitted into its own unique object code section. See below how this can be used to reduce code size. #### -fdata-sections Like **-ffunction-sections**, but for variables. #### 2.2.2.1 Code and Data Garbage Collection You can use **-ffunction-sections** and **-fdata-sections** to reduce the size of some applications, to allow automatic removal of unused functions and variables. But note that if your application does not contain much unused code or data, then these flags might slightly increase the total size, due to extra padding between functions and variables. The trick is achieved by compiling your source files with one or both of these options, which causes each function and variable to be placed into a unique object code section, and then instructing the linker to "garbage collect" unused sections, as identified by performing a tree-walk of all code and data cross-references, starting from the program's entry point. The linker will do this when given the **-gc-sections** option. Here's an example showing just two files being compiled and linked: ``` $ mips-sde-elf-gcc -Os -ffunction-sections -fdata-sections -c a.c -o a.o $ mips-sde-elf-gcc -Os -ffunction-sections -fdata-sections -c b.c -o b.o $ mips-sde-elf-gcc -Wl,-gc-sections -o prog a.o b.o ``` Note that these options shouldn't be used when debugging your code - the multiple sections will confuse the debugger - only do this for production builds. **Tip:** It may be counter-productive to use **-fdata-sections** when compiling MIPS16 code, since it disables the MIPS16 "section-relative addressing" optimization. # 2.3 GP-relative Addressing The GCC manual describes the **-Gnum** option, which controls the maximum size of global and static data items that can be addressed in one instruction instead of two. The default value is 8 bytes, which is large enough to hold all simple scalar variables. This optimization technique is known in MIPS toolchains as gp-relative addressing, and relies on #### **MIPS32Compiler Options** the compiler, assembler, linker and run-time initialization code cooperating to pool all of the "small" data items together into a single region, and then setting the gp register to point to the middle of this region. These items can then be referenced with a single instruction, using a signed 16-bit offset (i.e. -32768 to 32767) from the gp register (\$28), instead of the usual two instruction sequence. However there are some potential pitfalls with this technique: - You must take special care when writing assembler code to declare global (i.e. public or external) data items correctly: - 1. Writable, initialised data of gnum bytes or less must be put explicitly into the . sdata section, e.g.: ``` .sdata small:.word0x12345678 ``` 2. Global common data must be declared with the correct size, e.g.: ``` .commsmall, 4 .commbig, 100 ``` 3. Small external variables must also be declared correctly, e.g.: ``` .externsmallext, 4 ``` • In C you must declare global variables consistently in all modules which define or reference them. For external arrays either omit the size (e.g. extern int extarray[]), or give the correct size (e.g. int cmnarray[NARRAY]). Don't just give a dummy size of 1. Watch out particularly for use of the magic compiler/linker variables like \_end, \_edata, etc.: they should be declared as character arrays of unknown size, e.g. ``` extern char _end[]; ``` - If your program has a very large number of small data items or constants, the **-G8** option may still try to push more than 64KB of data into the "small" region; the symptom will be obscure relocation errors ("relocation truncated") when linking. Fix it by disabling gp-relative addressing with the **-G0** option; most of the time you won't lose too much. - Some real-time operating systems and PROM monitors can be entered by direct subroutine calls, rather than via a "system call" trap. The use of simple subroutine calls between sections of the program which were not linked together means that it is not possible for the application and the monitor to share a gp area. In this case either the application or the monitor/RTOS (but not necessarily both) must be built with **-GO**. When a particular **-G** option has been used for compilation of any set of modules, then it is usually necessary that all other modules and libraries should be compiled with the same value, to avoid linker relocation errors (e.g. one module references a variable which it thinks is in a "small data" section, while the other defines it in a non-small section). To avoid relocation overflow errors when linking, the safest solution is to compile all modules within a mixed 32-bit and MIPS16 system using the same value of **-G**. Of course larger values of **-Gnum** can be used to increase the scope of this optimization. However, at the moment the only way to find the limit is an iterative process of recompiling with increasing values, until you overflow the 64K limit. One day it may be possible to determine an optimal value automatically. # 2.4 Unaligned Data The standard MIPS load and store instructions require that all data is aligned on its "natural" boundary, i.e *shorts* on a multiple of 2 bytes, *ints* on a multiple of 4, and *doubles* on 8. If the alignment is not correct, then the CPU will generate an address exception. Because of this restriction, *gcc* will normally align all data structures and their fields on their natural boundaries. However some software ported from 8 or 16-bit CPUs may rely on data structures whose fields align to a smaller boundary (e.g. for network protocol headers, or printer font cartridges, etc.). There are two ways to convince gcc to change its default alignment rules: - 1. Use the GCC attribute (packed) extension on whole structures or individual structure fields see the Extensions section of the GCC manual for full details. - 2. Precede the definitions of packed structures with the single line #pragma pack(x), where x is the alignment boundary, in bytes. Follow the declaration with the line #pragma pack(x), which restores the normal alignment rules don't forget this, your code may continue to work, but quietly become bigger and slower! For example: 3. In desperation you can compile your program with the **-fpack-struct** option, which removes padding from all structures. But that will make your whole program bigger and slower, and may cause problems such as ABI incompatibility with libraries that weren't compiled with this option. The compiler will where possible use the MIPS left/right load and store instruction pairs to access unaligned structure fields, but this will be less efficient than if the data were correctly aligned. So use the pack options only on data structures where it is essential. However these mechanisms do not solve the problem of how to handle unaligned pointers to simple scalar types (e.g. int). Currently there are two ways to handle this: If you know that there a few specific pointers which will frequently hold unaligned addresses, then you can modify your code to use the generic macros defined in <unaligned.h>. For example replace this: ``` int foo (int *ip) { return *ip; /* ip is known to be frequently unaligned */ } void bar (short *sp, short val) { *sp = val; /* sp is known to be frequently unaligned */ } ``` by this: ``` #include <unaligned.h> int foo (int *ip) { return unaligned_get (ip); } void bar (short *sp, short val) { unaligned_put (sp, val); } ``` Where there are very occasional and unpredictable unaligned references, then you can install an exception handler which fixes up instructions which generate an unaligned Address Error (**xcptades** or **xcptadel**) exception. So long as you use standard exception handlers then you can do this by putting a call to \_mips\_unaligned\_init() at the beginning of your code. Certainly don't do this for performance critical code - just use it as a way to get started when first porting an application. # 2.5 Software Floating Point When an application performs floating point computations and the target CPU is not equipped with a floating point unit (called coprocessor 1, or "CP1" in MIPS-speak), then the floating point operations must be performed by software subroutines. The C run-time library (provided with the bare-metal/ELF version of the tools) includes two libraries: - A library that emulates the Floating-Point Coprocessor, including the control registers. This library is called libcs3-mips-cp1.a, or -lcs3-mips-cp1 to the linker. This emulator calls a lower-level library that implements each floating-point instruction (see next bullet item). - A library that emulates each IEEE-754 compliant floating point operation using integer instructions (in library libcs3-mips-fpemu.a, or -lcs3-mips-fpemu to the linker) which performs floating point arithmetic using only integer operations. There are different ways in which software emulation of floating-point is implemented: - When you use the compiler's -msoft-float option it will keep all floating point values in integer registers (a pair of them for double-precision when using 32-bit registers), and will generate direct calls to the software floating point routines within the gcc compiler to perform all floating point arithmetic. This is the best option if you know that you will never have a hardware floating point unit in your target system. - If -msoft-float is not used (or -mhard-float is) then the compiler will emit code which uses hardware floating point registers and instructions. You then have to include a "CP1 emulator" in your program which catches "Coprocessor Unusable" traps, interprets the instructions, and invokes the software library to emulate them. This results in even slower code than when using -msoft-float, but may be the option to use when creating a single program binary which must be capable of working either with or without a hardware floating point unit, detected at run-time. This is when the software floating point emulation library is used. To include the FPU emulator, these compiler switches must be used: ``` -lcs3-mips-cp1 -lcs3-mips-fpemu -W1,--defsym,__cs3_mips_float_type=2. ``` • The final option is only for the IDT R4650 and R4640 CPUs, which are equipped with only a single-precision floating point unit. For these CPUs you should use the **-msingle-float flag**, which tells the compiler to generate hardware floating point instructions for single-precision operations, but call the emulation routines within the gcc compiler for double-precision. Remember that in all cases emulated floating point is much slower than hardware - up to 100 times slower for the trap-based emulation. # 2.6 MIPS16® ASE Support The MIPS16 ASE is only available on bare-metal/ELF targets. It is not available for the GNU/Linux target. The "MIPS16" instruction set is an extension to the MIPS architecture (an "ASE") that allows you to build much smaller binaries. It requires that the CPU implement a set of operations encoded with fixed-length 16-bit instructions; this new instruction set is selected with a "mode switch" controlled by a "least significant bit" included in the instruction address. You can successfully build and run a program with a mix of functions built both with MIPS16 and conventional instructions, but you can't mix the two instruction sets inside one C function. The MIPS16 ASE is most useful to the smallest and most deeply embedded systems, and is often not implemented on higher-end CPUs. "MIPS16e" is the name of an enhanced version of the MIPS16 instruction set; the enhancements were worked out from experience and help the compiler generate even smaller code. Note that all those MIPS32-compliant CPUs which support the MIPS16 ASE implement the MIPS16 extensions. Most often a MIPS16 operation corresponds to a single conventional MIPS instruction, but the small size imposes restrictions on choice of registers and the size of ``immediate" fields. For straightforward integer code -mips16 can cut code size by around one third, but it certainly won't do this if: - 1. you use floating point: the MIPS16 ASE doesn't encode f.p instructions or registers, which have to be replaced by calls to 32-bit code even if the CPU has an FPU, or - 2. you use unaligned data structures heavily: there are no lwl or lwr MIPS16 instructions, so these have to be synthesised as a sequence of byte loads, shifts, ors, etc. Most users will never, and should never, write MIPS16 assembler code. You'll find no assembler language documentation here. MIPS16 instructions are meant to be an intermediate code generated by the compiler to save space - possibly at the cost of some speed. MIPS16 CPUs always run the normal 32-bit MIPS instruction set as well, which is usually a better choice for assembler modules. MIPS16 functions can safely call functions consisting of ordinary 32-bit MIPS instructions, and vice versa. The hardware keeps track of MIPS16 mode by adding a bit zero to the instruction address pointer; so a jump-register instruction to an odd address implicitly switches into MIPS16 mode. Because normal absolute jal instructions don't contain the bottom address bits (since regular MIPS instructions are 4 byte aligned), a new instruction jalx is added which calls MIPS16 code from regular 32-bit code, or vice versa. The linker automatically converts a jal to a jalx when it sees a call across the MIPS16/regular-MIPS divide. MIPS16 functions using floating point must be declared carefully. The compiler automatically generates small "trampoline" stubs to copy floating point arguments and results back-and-forth between `hard' f.p. registers and the MIPS16 integer registers used for f.p. arguments. It's essential to provide full prototypes for such functions. #### 2.6.1 Global Variables and MIPS16 Code The global-pointer (GP) optimization used in 32-bit MIPS code to speed up access to small global variables is not usually appropriate to MIPS16 code, with its restricted load offsets (all GP-relative addresses would require an extended instruction). A mechanism has been developed for MIPS16 code which accesses variables defined within the same compile unit as the code using short "section relative" offsets. This optimization is of no benefit to "extern" or "common" variables, but is a big win when accessing locally defined variables. In order for this optimization to be more effective, code compiled using **-mips16** or **-mips16e** will by default also imply the specification of "**-GO -fno-common**". This has the following implications: - If you are compiling any modules using a 32-bit ISA, but you expect that they may be linked with MIPS16 code, then you must specify explicitly for the 32-bit modules. You can still link with existing, pre-compiled, 32-bit libraries that were compiled gp-relative addressing enabled, so long as the precompiled code does not try to reference global symbols defined in the **-G0** compiled code. Using **-mno-gpopt** is a better choice because then the small variables are placed in the small data sections, but the compiler just won't generate short references to them. The safest solution is to compile all modules in a mixed 32-bit and MIPS16 system using **-G0**. - The "`traditional" (but not ISO / ANSI compatible) C "common variable" behaviour named after the Fortran construct, which allows several modules to declare the same global variable, as long as no more than one of the declarations actually initialises the variable will no longer work. If possible you should avoid relying on this feature in portable code, but if it cannot easily be changed in your code, then you will have to specify -fcommon on the command line, and you will lose the section-relative addressing optimization on uninitialised global variables (uninitialised static variables will be optimized). Existing, pre-compiled libraries which use common variables will continue to work correctly when linked with code compiled with -fno-common, as long as they don't initialise the same variables. - You can flag individual variables where "common" behaviour is absolutely required, by using gcc's \_\_attribute\_\_ mechanism. For example: int errno \_\_attribute\_\_((common)); # 2.6.2 Global Register Variables In MIPS16 code only 8 registers are directly usable for arithmetic and pointers, but the remaining 24 registers are accessible indirectly. The compiler allows MIPS16 code to use gcc's global register variable extension to access these extra registers, which can provide a performance boost for global variables which are very frequently accessed in many separate, small functions. It is recommended that callee-saved registers \$s3-\$s7 only are used for this purpose (\$s0 and \$s1 are used by normal MIPS16 code, \$s2 is used by MIPS16 code if there is a hardware FPU, and \$s8 is sometimes used as a stack frame pointer in 32-bit code). Global register variables must be declared in a header file which is common to all modules, so that the register does not get reused for normal variables or temporaries by 32-bit code. Here is an example of how to declare and use a global register variable: ``` register struct insn *curinsn __asm__("$s3"); unsigned int getinsn_opcode (void) { return curinsn->opcode; } ``` # 2.6.3 Divide by Zero Checks (-mcheck-zero-division) When generating MIPS16 code the compiler will not generate the extra code to check for division by zero, so divide by zero will generate an undefined result. If for debugging purposes you wish division by zero to generate a trap, then use the **-mcheck-zero-division** compiler option. # 2.6.4 Execute-only Code / Split I-D RAM In MIPS16 code the compiler normally places implicit constants inline within the executable code section, interleaved with or following the function which uses them. This allows the constants to be accessed efficiently using the MIPS16 PC-relative load and addiu instructions. However some MIPS Technologies cores support independent, Harvard-style on-chip instruction and data memories known as SRAM or SPRAM. In such a configuration a program cannot read constant data from the I-side memory without special hardware support, which causes the CPU to treat the MIPS16 PC-relative load instructions like an instruction fetch, and ``redirect" the load from the D-side memory port to the I-side. Use the **-mno-data-in-code** flag when compiling MIPS16 code to run in ISRAM on a system without the hardware redirect. It will generate larger and slower code (5% larger on average) - so don't use it unless you have to. Also make sure that you use the **-mno-data-in-code** flag when linking your program, to select a compatible multilib variant. When the **-mno-data-in-code** flag is used, it also switches off the **-G0** option - otherwise the default for MIPS16 code - so that it can place the constants into the small data section, and access them via the \$gp register. You can use the **-G0** option explicitly to prevent this, but it may increase code size significantly. The **-mcode-xonly** flag is a weaker alternative for MIPS16 code running in on-chip ISRAM where the system does implement the hardware redirect (e.g. the M4K). The hardware redirect operates only for PC-relative loads, but MIPS16 code can still create pointers to the implicit constants - most obviously to literal character strings - to be used later by conventional load instructions, which would then read the data from the wrong memory. The **-mcode-xonly** option instructs the compiler not to place constant strings and computed jump tables into the code segment, while keeping simple integer and floating point constants inline with the code. This will usually result in only slightly larger code than a standard MIPS16 compilation. All of the MIPS16 libraries are now built with this option. The **-mcode-xonly** flag may also be useful for cores which implement the SmartMIPS ASE, which provides an extended virtual memory protection model that can mark pages as ``execute-only". Similar to the I/D redirect above, the MIPS16 PC-relative load represents itself to the TLB as an instruction-fetch so, for MIPS16 code running in mapped space, use **-mcode-xonly** to prevent strings and jump tables from being placed in the executable code section. # 2.6.5 Generating MIPS16® Code Add the compiler flag **-mips16** or **-mips16e**, and the module will be compiled using MIPS16 or MIPS16 instructions to generate compact code. The flags are (mostly) orthogonal in effect to other flags which set code generation options. It goes further than that: the **-mips16** flag used on the mips-sde-elf-gcc command line when linking your files will select MIPS16 or MIPS16 libraries. Back to compilations: sometimes a module might contain functions you want to compress, and some you would rather compile to regular 32-bit instructions - perhaps because the 32-bit instructions will give you better performance, or because you need to use instructions that are not available in MIPS16. MIPS16 code always takes longer to execute within the CPU, but if instruction fetch bandwidth is the critical determinant of the performance of some piece of code, then the smaller size of MIPS16 code can make it faster overall. The compiler uses the GCC \_attribute\_ extension to permit the instruction set to be selected on a per-function basis. For example: ``` __attribute__((mips16)) void smallfunc () { /* generates MIPS16 code */ } void __attribute__((nomips16)) bigfunc () { /* generates 32-bit MIPS code */ } void normalfunc () { /* compiled as per command-line flags */ } ``` It is likely that the attribute construct will be hidden by a macro, which can be controlled by an ifdef, e.g. ``` #if __mips #define large__attribute__((nomips16)) #define compact__attribute__((mips16)) #else #define large #define compact #endif compact void smallfunc () { } ``` If the command-line selects **-mips32**, then \_\_attribute\_\_((mips16)) will generate extended MIPS16 instructions, otherwise it will generate only "standard" MIPS16 instructions. Similarly, if the command-line selects **-mips16e**, then \_\_attribute\_\_((nomips16)) will generate MIPS32 code. If you have used the mips16 attribute, but wish to prevent it from taking effect, then compile with -mno-mips16. # 2.6.6 Sibling Call Optimization If you are mixing MIPS16 functions and 32-bit functions in your program, then it is not safe to allow the compiler to perform its ``sibling call" optimization, which can replace a call at the end of a function by a jump to the other function. Since there is no jx instruction to switch from 32-bit to MIPS16 mode, only jalx, this optimization must be prevented when a 32-bit function calls a MIPS16 function. If it were to occur, then it would result in a error message from the linker when it tried to relocate the jump instruction. You can prevent this optimization from taking place in two ways: - 1. Most easily by using the compiler's **-fno-optimize-sibling-calls** option. - 2. At a more fine-grain level, by ensuring that all global MIPS16 functions are correctly declared with function prototypes which include \_attribute\_\_((mips16)) in the function type. It is not necessary to do this for 32-bit functions, since the compiler will never generate sibling calls from MIPS16 functions. ### 2.6.7 Main differences between MIPS16® and MIPS16e™ Code The new MIPS16 instructions clean up a few wrinkles where the original MIPS16 definition caused the compiler to generate wasteful code. These are: - An instruction to save registers and do other function entry housekeeping, with a matching instruction to restore registers on function exit. (They only support a 32-bit register model.) - Instructions which sign- or zero-extend partial-word values in registers. - Variants of the indirect jump and jal instructions which don't have a visible branch delay slot. You'll be surprised how much they help. If you have only the original MIPS16 instruction set available, then there is a compiler option **-mentry** which will reduce code size further, by generating ``reserved" MIPS16 instruction codes to perform commonplace function entry/exit housekeeping. These unimplemented instructions cause an exception and require an appropriate exception handler - that's dreadfully slow with the standard kit, and will be pretty bad even with a tuned handler. # 2.7 Predefined Preprocessor Macros Your program can detect what sort of CPU and instruction set it is being compiled for by testing a number of predefined C preprocessor macros. For example: ``` #if __mips == 32 #if __MIPSEB /* big-endian MIPS32 code */ #endif #if __MIPSEL /* little-endian MIPS32 code */ #endif #enfig ``` The full table of predefined macros defined by GCC for MIPS is as follows: **Table 2.2 MIPS Predefined Macros** | Macro | Purpose | |-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | _mips | Defined whenever compiling code for a MIPS ISA. Has as its value the selected ISA level, e.g. 1 for <b>-mips1</b> , 32 for <b>-mips32</b> , and 64 for <b>-mips64</b> . | | mips_isa_rev | The ISA revision level - only relevant for MIPS32 and MIPS64 - has the value 1 for the original revision, or 2 for the second revision of the ISA (i.emips32r2). | | mips64: | Defined when compiling for an ISA which supports 64-bit general purpose registers. Not the same as ``mips == 64", since it will also be defined for the 64-bit MIPS 3 and MIPS IV ISAs. | | mips_fpr | Specifies the size in bits (64 or 32) of each floating point register, as selected by the base ISA and ABI, or by the - mfp64 compiler flag. | | mips16 | Defined when <b>-mips16</b> is used to select generation of compact MIPS16 code. | | mips_hard_float | Defined when generating hardware floating point instructions. | **Table 2.2 MIPS Predefined Macros (Continued)** | Macro | Purpose | |--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | mips_soft_float | Defined when <b>-msoft-float</b> is used, and the compiler will generate calls to a software floating point emulation library. | | mips_no_float | Defined when the <b>-mno-float</b> flag is used, to request libraries without floating point support, to reduce program size. Otherwise equivalent to _mips_soft_float. | | mips_dsp | Defined when the DSP ASE is enabled, using either the - mdsp compiler flag is specified, or when the -march= option specifies one of these CPUs from the 24ke/34k/74k family. | | mips_paired_single_float | Defined when <b>-mpaired</b> -single is used to enable the "paired single" SIMD floating point extension. | | mips3d | Defined when the <b>-mips3d</b> flag is used to enable the MIPS 3D ASE. | | mips_smartmips | Defined when the <b>-msmartmips</b> flag is used to enable the SmartMIPS ASE, or when enabled implicitly because <b>-march</b> = is set to 4ksc or 4ksd. | | _MIPSEB | Defined when compiling code for a big-endian CPU, i.e. when the <b>-EB</b> flag is used. | | _MIPSEL | Defined when compiling code for a little-endian CPU, i.e. when the <b>-EL</b> flag is used. | | _MIPS_ARCH | Where CPU is the name specified with the compiler's - march= option, converted to upper case. | | _MIPS_TUNE | Where CPU is the name type specified with the compiler's -mtune= option, converted to upper case | | _SOFT_FLOAT | Same as _mips_soft_float, for compatibility with previous versions of MTI SDE. | | _NO_FLOAT | Same as _mips_no_float, for compatibility with previous versions of MTI SDE. | | pic | Defined when generating MIPS/abi position-independent code, as selected by the <b>-fpic</b> or <b>-mabicalls</b> compiler flags. | | PIC | Equivalent to _pic | | SDE_MIPS | Defined to indicate that the code is being compiled for the mips-sde-elf target. | | _MIPS_SIM | Indicates the ABI or calling convention in use - takes one of the values _ABIO32 (1), _ABIN32 (2), _ABI64 (3), _ABIO64 (4), or _ABIEABI (5). | | _MIPS_FPSET | Indicates the number of 64-bit floating point registers available: 16 or 32. This encodes the same information as _mips_fpr above, but in a different way, and is included for compatibility with Irix. | | _MIPS_SZINT | Indicates the size in bits of the int type: 32 or 64. | | _MIPS_SZLONG | Indicates the size in bits of the long type: 32 or 64. | | _MIPS_SZPTR | Indicates the size in bits of pointer types: 32 or 64. | The compiler also makes a number of predefined ``assertions" which can be tested at compile-time, however these are deprecated in favor of the more widely supported conventional pre-processor macros and constants. # **GDB Debugging with the MDI interface** NOTE: Throughout this document, the command prefix "mips-sde-elf-" is used (assuming that you're using the bare-metal/ELF), for example mips-sde-elf-gdb. If your target is actually linux, the command prefix would actually be "mips-linux-gnu-". Source-level debugging of an embedded application requires two components. The host debugger *mips-sde-elf-gdb* has access to your source and object files, and understands the structure of your program and data. But to interact with the running software *gdb* needs to be able to read/write memory and registers, and access on-CPU debug functions on your target system. The connection between GDB and the target will be one of the following: - 1. A connection which exploits an on-CPU debug connection such as MIPS Technologies' EJTAG. This will need a special piece of hardware (a probe) connected to the CPU on the board under test, some physical connection to the probe (typically Ethernet, USB or parallel port), and some host software to connect GDB to the probe. - MIPS Technologies promotes a software interface called "MDI"; it's a standard interface for the on-host software which connects to an EJTAG probe.. - Some EJTAG probe manufacturers don't provide an MDI interface, but are compatible with *gdb*'s standard remote debug protocol (Abatron, for example). Some others have totally proprietary interfaces, in which case they may come with their own proprietary debugger, which may be compatible with the compiler check with your probe supplier. - 2. An ethernet or serial port connection to the target, together with a "target monitor" program running on your target CPU. The target monitor is a little "server", attached to the host via serial port or network link, which can be requested to inspect or patch memory, to catch exceptions (particularly breakpoint exceptions) and report the application's CPU state. - MIPS Technologies' YAMON monitor includes a built-in target monitor, which can communicate directly with *mips-sde-elf-gdb* over a serial port. Your target may not be a real piece of hardware, but a software simulator. The basic GNU MIPS simulator included with the toolchain is built-in to *mips-sde-elf-gdb*; while MIPS Technologies' MIPSsimmipssim simulator (much more grown-up and accurate) is available as a separate DLL which connects to *gdb* via the MDI interface. MIPSSIM is available from MIPS Technologies. Usually you will use *gdb*'s load command to download your application to the target - but that can be very slow and tedious over a serial port. If you don't have a dedicated debug probe, then a ROM monitor which supports Ethernet downloading (such as the YAMON monitor) can be very helpful - see Chapter 4, "Manual Downloading" on page 57. All of the debugging features described in the [Gdb] reference manual are available for remote programs, but note: 1. While you may be able download a program via Ethernet, or some other high-speed mechanism, you will usually still need some other connection (e.g. EJTAG or serial cable) by which *gdb* can control the monitor. No known MIPS boards support a complete download and debug cycle over Ethernet alone. 2. Once a program has started running it cannot be restarted simply by using the *gdb*'s run command - the initialised data has most likely been modified by the program, and must be re-initialised by reloading the program first. Please refer to the printed or online GDB manual for more information about the GDB command line interface. # 3.1 MDI Debugging MIPS Technologies promotes a software API called "MDI"; it's a standard procedural interface by which host software can connect to an EJTAG probe or software simulator, via a dynamically loaded library conforming to the Microprocessor Debug Interface (MDI) specification. Once you have configured MDI for the first time, following the instructions below, it is as easy to operate as any other *gdb* remote target. A typical command-line debug session might start like this on the Host system: ``` $ setenv LD_LIBRARY_PATH /path/to/mdi/link/library:${LD_LIBRARY_PATH} $ setenv GDBMDILIB library_filename $ mips-sde-elf-gdb xxxram (gdb) b main (gdb) target mdi 15:1 (gdb) load (gdb) run Breakpoint 1 at main... ``` The following sections look in more detail at setting up and using the two most common MDI targets: the MIPSsim simulator and an MDI-enabled EJTAG probe. # 3.1.1 MDI Debugging with the MIPSsim™ Simulator MIPS Technologies Inc. has developed the comprehensive and accurate MIPSsim simulator for its core CPUs. If your toolchain did not include MIPSSIM, MIPSSIM is available from MIPS Technologies. The MIPSsim software runs on Windows (NT, 2000 and XP), x86 Linux, and Solaris 2.6 or above. ### 3.1.1.1 Configuring the MIPSsim<sup>™</sup> Simulator for GDB *mips-sde-elf-gdb* connects to the MIPSsim simulator via its MDI library interface, and there are a few configuration steps which you must perform first, so that *gdb* can "find" the MIPSsim library. - 1. First install, configure and test your MIPSsim package, following the instructions in the MIPSsim Getting Started Guide supplied with it. - 2. *mips-sde-elf-gdb* finds the MDI library using environment variables. #### For Linux/unix: ``` For bash, ksh, etc: $ export LD_LIBRARY_PATH=/path/to/mdi/link/library:$LD_LIBRARY_PATH $ export GDBMDILIB=library_filename ``` For csh and tcsh: - \$ setenv LD\_LIBRARY\_PATH /path/to/mdi/link/library:\${LD\_LIBRARY\_PATH} - \$ setenv GDBMDILIB library\_filename The library filename is most likely something like MIPSsim\_MDI.so or MIPSsim\_MDI.soc. #### For MS windows: You'll need to make sure that the directory containing the MIPSSIM MDI DLL has been added to your PATH variable, or copy the DLL to \windows\system on Win9x, or \windows\system32 on Win32 systems (NT and above). There are multiple ways to set the environment variables: Right Click MyComputer->Properties->Advanced->Environment Variables OR if you're using cgywin: \$ export PATH=/path/to/mdi/link/library:\$PATH OR in a windows shell: \$ set PATH=yourdrive:\path\to\mdi\link\library;%PATH% You'll also need to set the GDBMDILIB environment variable: \$ export GDBMDILIB=library\_filename The DLL filename is most likely something like MIPSsim MDI.dll. 3. Now you can start the debugger and use the MDI target: ``` $ mips-sde-elf-gdb helloram (gdb) target mdi 15:1 ``` ### 3.1.1.2 Selecting the MIPSsim<sup>™</sup> CPU When you connect to the MIPSsim simulator you have tell it which CPU core to simulate. You do this by specifying an MDI *target group* and *device* pair. The way that you do this depends on whether you are using the command-line or GUI interface to *gdb*. 1. For the command-line interface to gdb enter these commands: ``` $ mips-sde-elf-gdb (gdb) show mdi devices Targ 01: Default Dev 01: MIPS32_4Kc BE ``` ``` Dev 02: MIPS32_4Kc LE Dev 03: MIPS32_4Km BE Dev 04: MIPS32_4Km LE Dev 05: MIPS32_4Kp BE Dev 06: MIPS32_4Kp LE Dev 07: MIPS32_4KEc BE Dev 08: MIPS32_4KEc LE Dev 09: MIPS32_4KEm BE Dev 10: MIPS32_4KEm LE ``` That should print out a list of all the CPU devices supported by the MIPSsim software, and their associated target group and device numbers. If it instead says 'MDI not available", then you have probably not installed the MIPSsim package correctly, or not run the mdi command to select the MIPSsim library. Now you can tell *gdb* which device to use. Assuming that you wanted a little-endian 4KEc core, then looking at the above list we can see that it's target group 1, device 8. So: 1. Set the GDBMDITARGET and GDBMDIDEVICE environment variables to the appropriate target group and device numbers. ``` For bash, ksh, etc: export GDBMDITARGET=1 export GDBMDIDEVICE=8 For csh and tcsh: setenv GDBMDITARGET 1 setenv GDBMDIDEVICE 8 ``` 2. Or add the following *gdb* commands to your .qdbinit file: ``` set mdi target 1 set mdi device 8 ``` 3. Or specify them on the target command line when you connect to the MDI library, for example: ``` $ mips-sde-elf-gdb (qdb) target mdi 1:8 ``` Note that in both cases MIPSsim's MDI interface currently lists all of the CPU cores which it knows about, even if that core simulator is not installed. If you select a CPU type for which you do not have the core simulator library installed, then you will see an error reported when you try to connect to it. ## 3.1.1.3 Building for a MIPSsim<sup>™</sup> Target Here's an example of how to build a program with MIPSSIM as the target. 1. Compile the program: ``` $ mips-sde-elf-gcc -T mipssim-hosted.ld hello.c -g -o hello.x ``` 2. You can run the program in command-line mode: ``` $ mips-sde-elf-gdb helloram ``` #### **GDB** Debugging with the MDI interface ``` (gdb) target mdi (gdb) load (gdb) run ... (gdb) guit ``` # 3.1.1.4 Downloading to a MIPSsim<sup>™</sup> ROM Target We include it in case you need to write your own Makefiles, or in case something goes wrong. When you build a program to blow into a physical ROM memory (e.g. EPROM or Flash) you can use the mips-sdeelf-conv program to convert it into an ASCII S-record file (or similar), suitable for a PROM programmer. At the same time its initialised, writable data segment is relocated and concatenated to the end of the code segment, from where it is later copied down into RAM. But *gdb* can't load an S-record file, so how do you load a ROM image into a bare MIPSsim simulator via *gdb*? The answer is that mips-sde-elf-conv takes your executable ELF file, and outputs a new, relocated ELF file with the .relf extension. The relocation is done exactly the same way as when creating a real, physical PROM image. The final step in the chain is that *gdb*'s "load" command automatically checks for a file with the same name as your executable, but with the <code>.relf</code> extension. If this is found then it is this file that will actually be downloaded via MDI into the simulated MIPSsim ROM. When execution is started the ROM startup code will (after initialising caches, etc) copy the initialised data and possibly code into RAM. Your program image will now correspond to the original ELF executable file, and debugging can begin. Finally, if you are not using gdb to load and run the program, but wish to load a program directly into the MIPSsim simulated ROM using the APP\_FILE setting in the MIPSsim configuration file, then remember to use the .relf file, not the original ELF file. ### 3.1.1.5 Non-standard MIPSsim<sup>TM</sup> Configurations By default GDB will dynamically create a MIPSsim CPU configuration file to match your selected CPU type. It does this from a template stored in file <install\_top\_dir>/share/mipssim.cfg. While this will a sufficient MIPSsim configuration to get you going, if you later need to change any of the CPU or memory parameters, or add new device or CorExtend libraries, then you'll need to create your own MIPSsim configuration file. You can do this using either the MIPSsim GUI, supplied as part of the MIPSsim package, or by using a simple text editor. Full details of the configuration file format are contained in the MIPSsim documentation. The crucial configuration settings which you must change from the defaults supplied with the MIPSsim package are as follows: | File | Description | |------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | APP_FILE | Must be blank, or commented out. | | DUMP_FILE | Must be blank, or commented out. | | BIG_ENDIAN | To avoid a warning message set this to match your program's endianness. | | TRACE_FILE | In v4.x of the MIPSsim simulator, just setting this will cause a trace log to be written to that file. You may not want to do that for normal debugging, since it will slow down the simulator. | **Table 3.1 MIPSsim Configuration Settings** You then have to tell *gdb* and the MIPSsim library how to find the configuration file which you just created, either: 1. Set the GDBMIPSSIMCONFIG environment variable to the name of the file, e.g. ``` For bash, ksh, etc: export GDBMIPSSIMCONFIG=/path/to/myconfig.cfg For csh and tcsh: setenv GDBMIPSSIMCONFIG /path/to/myconfig.cfg ``` 2. Or set it in the local gdbinit file as follows: ``` set mdi configfile /path/to/myconfig.cfg ``` It may be that you are happy to use GDB's default CPU core configuration file, but want to define a new device configuration file with more realistic memory timings, or new device models. GDB will add a reference to your device configuration file to its auto-generated core configuration file if you do one of the following: - 1. Set the GDBMIPSSIMDEVCFG environment variable to the name of the device configuration file. - 2. Or set it in the local gdbinit file: ``` set mdi devcfgfile /path/to/mydev.cfg ``` # 3.1.2 MDI Debugging with an EJTAG Probe MIPS Technologies is encouraging EJTAG probe manufacturers to offer an MDI interface to their devices. This provides a powerful way to debug system software using *gdb* at the lowest level, directly controlling the CPU core. ### 3.1.2.1 Configuring your Probe for GDB - 1. First follow the installation instructions supplied with your probe hardware, and check that you can access and control your CPU core via the probe vendor's own command-line debug tool. - 2. Set the environment variables so that gdb can reach the link library for the probe: #### For Linux/unix: ``` For bash, ksh, etc: $ export LD_LIBRARY_PATH=/path/to/mdi/link/library:$LD_LIBRARY_PATH $ export GDBMDILIB=library_filename For csh and tcsh: $ setenv LD_LIBRARY_PATH /path/to/mdi/link/library:${LD_LIBRARY_PATH} $ setenv GDBMDILIB library_filename ``` #### For MS Windows: You'll need to make sure that the directory containing the probe's MDI DLL has been added to your PATH variable, or copy the DLL to \windows\system on Win9x, or \windows\system32 on Win32 systems (WinNT and above). There are multiple ways to set the environment variables: Right Click MyComputer->Properties->Advanced->Environment Variables OR if you're using cgywin: \$ export PATH=/path/to/mdi/link/library:\$PATH OR in a windows shell: \$ set PATH=yourdrive:\path\to\mdi\link\library;%PATH% You'll also need to set the GDBMDILIB environment variable: \$ export GDBMDILIB=library\_filename The DLL filename is most likely jnetfs2mdilib.dll. 3. Now you can select your probe configuration and run *mips-sde-elf-gdb*, for example: ``` $ mips-sde-elf-gdb helloram (gdb) target mdi ``` ### 3.1.2.2 Selecting the EJTAG CPU EJTAG probes connected by USB or parallel port probably support only one CPU at a time - the one to which it is currently connected. In that case you can probably connect to the probe without having to specify an MDI device number. But with some probes you may have to tell their MDI interface the name of the CPU, or the probe's Ethernet address, or some such. This selection can be made following exactly the same procedure described for selecting a MIPSsim CPU type. described in Section 3.1.1.2 "Selecting the MIPSsim<sup>TM</sup> CPU". ### 3.1.2.3 Building for an EJTAG-connected Target Here's an example of how to build a program with the MALTA board as the target. 1. Compile the program: ``` $ mips-sde-elf-gcc -T malta-24kc-ram-hosted.ld hello.c -g -o hello.x ``` 2. You can run the program in command-line mode: ``` $ mips-sde-elf-gdb helloram (gdb) set mdi connectreset 7 (gdb) target mdi (gdb) load (gdb) run ... (gdb) quit ``` ### 3.1.2.4 Resetting the CPU When you connect to a remote CPU via an EJTAG probe to download and run your program, you may want to simultaneously reset the CPU to ensure that it always starts in a known good state. However on many evaluation boards the reset signal will also reset the memory controller, which will prevent you (and *gdb*) from accessing DRAM until it has been programmed. Rather than teaching *gdb* how to initialise your memory controller, the simplest thing to do is allow the onboard PROM monitor (e.g. the YAMON monitor) to run just long enough to program the memory controller, and then halt the CPU so that *gdb* can take control. This behaviour is controlled by *gdb*'s "mdi connectreset" setting, which can have the following values: - Off is the default value, and in this case *gdb* does not try to reset the remote CPU, it simply halts it. Note that for this to work you may need to modify your probe software's configuration files to prevent it from automatically resetting the CPU. - On In this case gdb will reset the CPU and then halt it immediately. You shouldn't use this unless your memory controller automatically resets into a usable state, or you are willing to use gdb commands to program it manually. - N In this case *gdb* will reset the CPU, allow it to run for N usually sufficient to allow the YAMON monitor to initialise the board. You can effect this setting in a number of different ways: 1. Set it in the local gdbinit file as follows: ``` set mdi connectreset 7 set mdi connectreset on ``` 2. Or set the GDBMDICONNRESET environment variable: ``` For bash, ksh, etc: export GDBMDICONNRESET=7 export GDBMDICONNRESET=0 # On For csh and tcsh: setenv GDBMDICONNRESET 7 setenv GDBMDICONNRESET 0 ``` 3. Or you can set it each time you enter the "target" connect command, by appending ", rst=N" to the device number. For example: ``` (gdb) target mdi 1,rst=7 ``` ## 3.1.3 MDI Debugging Tips ### 3.1.3.1 Command line arguments If your application has been linked with the standard run-time system, then you can pass command-line arguments to your application (via argc and argv) when debugging via MDI: 1. When using the *gdb* command-line interface, append the arguments to *gdb*'s "run" command, or set the *gdb* "args" variable. See the [Gdb] reference manual for more details. #### 3.1.3.2 MDI Host File I/O If your application has been linked with the standard run-time i/o system, then console and file i/o requests will be passed via the MDI interface to *gdb*. You can see your program's output in *gdb*'s console window. If your program attempts to read from its console, then you can input text through *gdb*'s console window when you see the "app>" prompt. Your program can also read and write files on your host computer. Beware that this i/o mechanism will not work if you don't use *gdb* to load and run your program; for example if you load the program directly into MIPSsim using the APP\_FILE setting in the MIPSsim configuration file. In such cases you must find some other way to perform console and file i/o, such as via an additional MIPSsim device which you provide. #### 3.1.3.3 MDI Variables and Commands The MDI interface adds a number of new *gdb* variables and commands which provide finer grain control over the MDI library and its attached CPU than would normally be available with remote *gdb* targets. ``` set mdi stepinto ``` When set to on an MDI single-step will always execute exactly one instruction - if an interrupt or exception occurs then execution will stop with the PC pointing to the start of the exception handler. In environments where interrupts are occurring faster than the time it takes to step through the interrupt handler, it may not be possible to make any progress in the foreground application in this mode. When off, a single-step will always execute one instruction in the foreground application, ignoring asynchronous interrupts. This may be implemented simply by disabling interrupts globally while single-stepping. The variable defaults to off. ``` set mdi threadstepall ``` Selects simultaneous TC stepping mode when scheduler locking is enabled. When on, all TCs are stepped together, otherwise single-stepping only enables execution in the selected TC. Defaults to off. ``` set mdi continueonclose ``` When set, the target will be told to restart CPU execution when *gdb* closes its MDI connection. If off, then the target will be reset when the connection is closed. Defaults to on. ``` set mdi rununcached ``` If on then the program's start address is forced to an uncached address, since it may need to initialise the caches before trying to execute code. When false the start address is not changed. Defaults to on. ``` set mdi waittime ``` Sets the number of milliseconds which MDI should wait before returning a result to *gdb*, when waiting for the run/halt state of the CPU to change. Some MDI libraries ignore this. It defaults to 10ms. ``` set mdi library NAME ``` The name of the MDI DLL to connect to. Initialised to the value of the GDBMDILIB environment variable, if available. ``` set mdi configfile NAME ``` The name of the MIPSsim CPU configuration file. Initialised to the value of the GDBMIPSSIMCONFIG environment variable, if available. See Section 3.1.1.5 "Non-standard MIPSsim<sup>TM</sup> Configurations". ``` set mdi devcfqfile NAME ``` The name of the MIPSsim device configuration file. Initialised to the value of the GDBMIPSSIMDEVCFG environment variable, if available. Section 3.1.1.5 "Non-standard MIPSsim<sup>TM</sup> Configurations" ``` set mdi target TARGNUM ``` The MDI target group number to connect to. Defaults to the value of the GDBMDITARGET environment variable, if available. ``` set mdi device DEVNUM ``` The MDI device number to connect to. Defaults to the value of the GDBMDIDEVICE environment variable, if available. ``` show mdi devices ``` Displays a list of the available MDI target groups and devices. The MDI DLL library name must be known before this will work. ``` set mdi prompt ``` Sets the prompt to use when the application program requests console input. Defaults to "app>". ``` set mdi asid auto|off|on|ASID ``` Controls which address space to use when accessing mapped virtual addresses through the TLB, and for qualifying breakpoints. When set to "off" it uses the global address space; when "on" it uses the current ASID value in the CPU's *EntryHi* register; when "auto" it uses the global address space for unmapped address, and the current ASID for mapped addresses; otherwise it must be an explicit numeric ASID (0 to 255). Defaults to "auto". Breakpoints use the same setting to qualify the breakpoint request, which on certain targets may allow breakpoints to be triggered only when executed by a specific ASID. ``` show mdi tlb INDEX] ``` Displays the contents of the TLB. *INDEX* is an optional TLB index, else the whole TLB is displayed. ``` set mdi tlb INDEX HI LOO LO1 MASK ``` Programs the INDEX'th entry in the TLB using the values HI, LOO, LO1 and MASK. ``` show mdi cp0 REG[BANK] ``` Displays arbitrary Coprocessor 0 registers which are not normally accessible via *gdb*. The argument *REG* is the register number; /BANK is the optional bank number, default 0. #### **GDB** Debugging with the MDI interface ``` set mdi cp0 REG[/BANK] VALUE ``` Sets arbitrary Coprocessor 0 registers which are not normally accessible via *gdb*. ``` show mdi icache dcache scache ADDRESS [, SET] ``` Displays the contents of one line in the CPU's primary instruction, primary data or secondary cache. The *ADDRESS* argument is a byte offset into the cache, and *SET* is the cache set. Note that *SET* is optional, and if present a comma is required as separator between the two arguments; if absent then all sets at that cache offset are displayed. This command has the side-effect of setting *gdb* internal variables \$ctag, \$cparity, \$cdata0, \$cdata1, etc to the values displayed. If multiple sets are displayed, then only the highest numbered set is recorded in these variables. ``` set mdi icache dcache scache ADDRESS, SET, TAG, PARITY, DATA, ... ``` Sets the contents of one line in the CPU's primary instruction, primary data or secondary cache, using the values provided. Note that a comma is required as separator between the values. ``` set mdi connectreset on \mid off \mid N See Section 3.1.2.4 "Resetting the CPU". set mdi gmonfile NAME ``` Sets the file name to which *gdb* will write gprof profiling data, when enabled. The default file name is "gmon.out". ``` set mdi connecttimout N ``` The number of seconds for which *gdb* will wait for a target to halt execution when first connecting to it. The default is 1 second, set to 0 for unlimited timeout. GDB may be safely interrupted while it is waiting for the halt to complete. ``` set mdi gmonfile NAME ``` Sets the file name to which *gdb* will write *gprof* profiling data, when enabled. The default file name is "gmon.out". ``` set mdi profile ``` If set to "on", and you are using the MIPSsim simulator, then *gdb* will tell the simulator to collect profiling information which *gdb* will write to gmonfile when the program exits. If set to "auto", then *gdb* will automatically collect and output the profiling data, but only if your program contains the \_mcount symbol, which will be the case if your program was compiled with profiling enabled. The default is "auto". ``` set mdi profile-cycles ``` If set then, if MIPSsim profiling is enabled, *gdb* will tell the simulator to count cycles rather than instructions. This will only work of your MIPSsim software is licensed for cycle counting. Defaults to off. This can also enabled using the "mdi cycles enable" command, described below. In MIPSsim 4.0 and above you select whether you want cycle counting or not by the MDI device which you connect to - this setting will have no effect. ``` set mdi profile-mcount ``` If set then gdb includes the $\_mcount$ function in the profile data. Defaults to off, which doesn't profile $\_mcount$ . ``` set mdi mcount-symbols SYM ... ``` A list of symbol names in the executable which may label the function which is used to collect call-graph profile data, and should b excluded from the profile data unless mdi profile-mcount is set. Defaults to "mcount". ``` set mdi ftext-symbols SYM ... ``` A list of symbol names in the executable which may define the start of the executable code segment, for profiling. Defaults to "\_ftext". ``` set mdi etext-symbols SYM ... ``` A list of symbol names in the executable which may define the end of the executable code segment, for profiling. Defaults to "ecode etext". ``` set mdi logfile NAME ``` Name of a file in which to store a trace of calls made to the MDI library, for troubleshooting. Requires that GDB's debug remote is set to 1 or 2. It must be set before issuing the target mdi command. ``` mdi cacheflush ``` Causes dirty lines in the CPU data cache to be written to memory, and then invalidates all CPU caches. ``` mdi cycles enable ``` Enable MIPSsim cycle counting, if licensed. From this point on gdb's cycles convenience variable will be set to the current cycle count. By using the command display \$cycles you can then see how many cycles have been used as you step through your code. In MIPSsim 4.0 and above you select whether you want cycle counting or not by the MDI device which you connect to - this command has no effect. Also in MIPSsim 4.0 the counter includes the cycles required to flush the pipeline when an MDI breakpoint or single-step causes execution to stop, and to restart the pipeline when resuming execution. So there will be an overhead per breakpoint or step command which you will need to subtract. ``` mdi cycles clear ``` Clears the MIPSsim cycle counter to zero, and then enables cycle counting. ``` mdi cycles disable ``` Disables MIPSsim cycle counting. Has no effect with MIPSsim 4.0 and above. ``` mdi cycles status ``` #### **GDB** Debugging with the MDI interface Reports on whether MIPSsim cycle counting is available, and if so whether it is enabled or disabled. ``` mdi reset WHAT ``` It may sometimes be useful to start over from the reset vector when debugging system firmware. The optional argument can be one of the following: ``` ful1 ``` Reset the entire target system, if possible. This is the default action if no argument is given, and is often the only action supported by the hardware. The CPU will exit the reset state and halt before fetching the first instruction from the memory location at the reset vector. device If the device consists of a CPU plus peripherals, reset both if possible. periph If the device consists of a CPU plus peripherals, reset just the peripherals if possible. cpu If the device consists of a CPU plus peripherals, reset just the CPU if possible. ``` mdi regsync ``` Forces *gdb* to write back any modified register values to the target CPU. Normally this only occurs when *gdb* is about to restart execution of the application. ``` monitor COMMAND... ``` Sends the command line to the MDI library's "do command" interface. The command line is not interpreted by *gdb*. #### 3.1.3.4 MDI Troubleshooting If your MDI-connected probe or simulator appears to be misbehaving then it will help us to help you if you collect a log file which shows the MDI calls which occur between GDB and the MDI library. You may be able to work out what's going wrong for yourself, by looking at this file, but if not then please send it to us along with a log of your GDB session. You can create a log file by switching on remote debug mode **before** issuing the target command, and then repeating whatever commands cause your problem, e.g. ``` (gdb) set mdi logfile mdilog.txt (gdb) set debug remote 2 (gdb) target mdi ... (gdb) quit ``` ## 3.2 Debugging with MIPS® MT ASE To better understand the rest of this chapter, it will help if we first describe a couple of the fundamental terms defined in the MIPS MT (multi-threading) ASE: - Thread Context (TC): The hardware state necessary to support a single thread of execution within a multithreaded CPU device. This includes a set of general purpose registers, multiplier registers, a program counter (PC) and a small amount of privileged state. - VPE: A virtual processing element (VPE) is an instantiation of the full privileged CPU state on a multi-threaded CPU, sufficient to run an independent per-processor OS image it can be thought of as a virtual CPU. Each VPE must have at least one TC bound to it in order to execute instructions and be debuggable, but it may contain more than one TC when running an explicitly multi-threaded OS or application. A conventional single-threaded CPU could be thought of as implementing a a single TC bound to a single VPE. These components of the MT ASE may be used within a variety of programming models, with different debugging methodologies: • *LLMT*: Low-Level Multi-Threading (LLMT) describes programs which make explicit use of the hardware TCs to run multi-threaded code, but generally with no more software threads than there are hardware TCs. This may be a self-contained threaded application, or a simple RTOS kernel. When debugging such software you will want to track the behaviour of the hardware TC states as they execute threads within your program. See Section 3.2.1 "Debugging LLMT Applications" below. Be aware that LLMT debugging only provides visibility of threads which are assigned to hardware TCs. Neither the probe, simulator, nor GDB have the OS-specific knowledge to find and interpret a stored thread context in target memory. So if you are debugging a threaded application which has more threads than TCs to run them, presumably with a micro-kernel or RTOS to context switch the threads between TCs, then you will need to use the debugging facilities or thread-aware remote debugging protocol provided by the OS to debug application-level threads. The LLMT model may however be used to bring-up and debug the kernel. - AP/RP: The term AP/RP describes a programming model where the VPEs are treated as independent loosely coupled cores, with one acting as "Application Processor" (AP), running a complex operating system such as Linux; while the other acts as the "Realtime Processor" (RP), running dedicated real-time code without interference from the AP operating system's scheduling and interrupt handling. This mechanism has sometimes been known as AP/SP and AMVP. - Debugging an application program on the AP side use the standard OS application debugger, but debugging a program running on the RP side requires the *mips-sde-elf-gdb* debugger, with something like a remote serial or EJTAG probe connection to the RP VPE, as described in Section 3.2.3 "Debugging AP/RP Applications" below. - SMVP: Describes the execution of a largely unmodified symmetric multi-processing (SMP) operating system by multiple VPEs. In this environment application programs including multi-threaded applications will be debugged using the OS's usual debugger, and the underlying MT hardware will typically be "invisible" to the application programmer. See Section 3.2.4 "Debugging SMVP/SMTC Programs". - *SMTC*: An extension of the SMVP model which requires more significant modifications to an SMP operating system, so that it can schedule multiple software threads and/or processes to run on the hardware TCs. As with SMVP, the OS's normal application debugger will typically be used for debugging threaded applications; for kernel debugging the LLMT model may be applicable. ## 3.2.1 Debugging LLMT Applications When you debug low-level or operating system kernel code which makes explicit use of hardware TCs (the "LLMT" model described above) you can use *mips-sde-elf-gdb* in conjunction with an MDI library that supports the multi-threading extensions. At the time of writing this means a recent version of 34K MIPSsim or the FS2 EJTAG probe. The following section assumes that you have successfully connected GDB to your target via a suitable MDI library, as described in Section 3.1 "MDI Debugging". When you connect *mips-sde-elf-gdb* to a MT-capable CPU via an MT-aware MDI library, then the hardware TCs can be accessed using GDB's thread debugging facilities - for full details of these commands see the "Debugging programs with multiple threads" section of the GNU GDB manual (supplied as HTML and PDF with the toolchain). To illustrate how these facilities map onto hardware TCs, the critical features are also documented below. #### 3.2.1.1 Thread Status Whenever execution stops and control returns to GDB, the debugger will display which TCs have been activated or deactivated since the last prompt and, if it has changed, the name of the current thread. One thread is always the "thread of interest" to which all GDB commands will apply by default, and this is the "current thread". When GDB first regains control from the application the current thread will be the TC which hit the breakpoint, or completed a single-step. In the case of asychronous stop (e.g. typing Ctrl-C in the command line GDB) then any TC may be chosen as the current thread. You can display a list of all active TCs, and their program counters within the program, as follows: ``` (gdb) info thread 2 Thread Context 4 in client_thread() * 3 Thread Context 2 in server_thread() ... ``` Note that there are two numbers on each line: first GDB's thread number, and secondly the hardware Thread Context (TC) number. All of the GDB thread commands work in terms of GDB thread numbers (the first number), not the hardware TC numbers (the second number). So this particular example tells you that GDB's thread number 2 corresponds to hardware TC 4, and its program counter is within the client\_thread() function; while GDB's thread number 3 corresponds to TC 2, and its program counter is within the server\_thread() function. Thread 3 (TC 2) is marked with an asterisk to indicate that it is the current thread. #### 3.2.1.2 TC-specific Breakpoints You can set an breakpoint that will only "trigger" when executed by a specific TC simply by appending the thread qualifier to a breakpoint command. For example: ``` (gdb) b send_message thread 2 ``` This will set a breakpoint in the send\_message function to be activated only when executed by "client" thread (TC 4) listed above. Beware that a software breakpoint exception will be taken by every TC which executes the breakpoint instruction, requiring communication between GDB and the target. GDB will quietly step over breakpoints which occur in the wrong TC, but performance will be substantially reduced. It is not currently possible to specify a TC-specific hardware data watchpoint. A hardware watchpoint set up using GDB's watch, rwatch or awatch commands will trigger when any TC bound to the same VPE accesses that location in the specified manner. #### 3.2.1.3 Thread-specific Commands You can switch GDB from one TC to another using the thread command, or its alias te.g. ``` (gdb) thread 2 # switch to "client" thread (gdb) bt # do stack trace of "client" (gdb) t 3 # switch to "server" thread (gdb) info reg # display registers of "server" ``` Alternatively you can perform the same operation on a number of threads at once, e.g. ``` (gdb) thread apply 1 2 7 4 bt # apply backtrace cmd to threads 1,2,7,4 (gdb) thread apply 2-7 9 p foo # apply p foo cmd to threads 2->7 & 9 (gdb) t apply all x/i $pc # apply x/i $pc cmd to all threads ``` ## 3.2.1.4 Resuming threaded execution Normally when you issue a single-step command there is no guarantee which TCs will run in which order - they might even hit a breakpoint before your single-step request completes, "seizing the prompt" away from your original thread of interest. A sequence of single-step commands may switch you back and forth between your active TCs, or just advance the highest priority one. To avoid this happening while single stepping you may disable execution of all other TCs on the same VPE, apart from the currently selected TC, by using this command: ``` (qdb) set scheduler-locking step ``` But beware that this can get you into situations where the TC which you are stepping cannot make any progress, because it is waiting for a semaphore or mutex to be unlocked by another TC - so it is not always the most appropriate behaviour. Furthermore this command: ``` (qdb) set scheduler-locking on ``` will prevent other TCs on the same VPE from running in all cases, even when you resume execution using commands like continue, until or finish. Finally, rather than locking out the other TCs altogether, you can request that all TCs should "gang step" together. This requires both GDB scheduler-locking and "mdi threadstepall" to be set. For example ``` (gdb) set scheduler-locking on (gdb) set mdi threadstepall on ``` In summary: | Schedule-locking | MDI<br>threadstepall | Single-step Behavior | |------------------|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------| | off | Х | Current TC single-steps, all other TCs run freely until the current TC completes an instruction, or one of the other TCs hits a breakpoint | | on step | off | Current TC single-steps, all other TCs are suspended | | Schedule-locking | MDI<br>threadstepall | Single-step Behavior | |------------------|----------------------|---------------------------------------------------------------------------------------------------------------------------------| | on step | on | All TCs single-step together - the first to complete an instruction returns GDB to command mode, selected as the current thread | ## 3.2.2 Debugging Multiple VPEs Debugging multiple VPEs within a multi-threaded core is very similar to debugging multiple independent CPUs within a multi-core system. #### 3.2.2.1 Multiple VPEs with FS2 probe The rest of this section assumes that you have installed, configured and selected EJTAG probe software as your current MDI target, as described in Section 3.1.2 "MDI Debugging with an EJTAG Probe". For reliable multi-VPE debugging it is recommended that you use version 2.1.6.7 or higher of the FS2 probe software. When you start GDB with the probe connected to a 34Kc core, you should see something like this in response to the show mdi devices command: ``` (gdb) show mdi devices Targ 01: mips-single-core Dev 01: mips-single-core-root Targ 02: mips-dual-cores Dev 01: mips-dual-cores-mips1 Dev 02: mips-dual-cores-mips2 Targ 03: mips-34k Dev 01: mips-34k-vpe1 Dev 02: mips-34k-vpe0 ``` If that works as described, then you should now be able to connect to VPEO. Assuming the same numbering as above then use the "target mdi 3:2" command, which should result in output as follows (the last line, especially the address reported, will vary): ``` (gdb) target mdi 3:2 Selected device mips-34k-vpe0 on MIPS unknown [New Thread Context 0] Connected to MDI target 0x8010049c in ?? () ``` You are now set up and ready for debugging. #### **General VPE Debugging with Probe** To access a VPE within an multi-threaded CPU the appropriate target group number must be used (e.g. target group 3 in the example above). Within that group the device number 1 corresponds to VPE1 and the device number 2 corresponds to VPE0. Thus to attach to VPE0 you need to use "target mdi" and then select group number 3 and then device number 2 interactively, or alternatively use "target mdi 3:2". Likewise for VPE1 you could use "target mdi 3:1". A given VPE can only be usefully connected to if it has at least one thread context (TC) bound to it. Therefore with the default configuration, VPE0 can be controlled straight from RESET, but VPE1 can only be once some code has been run to bind a TC to it. However, GDB may be attached to a disabled VPE and it will keep waiting until it has been activated. GDB may be used to load a program to be debugged to the target. A typical session in this case is going to include the following commands in the given order: ``` $ mips-sde-elf-gdb program ... start GDB and load program's symbol table (gdb) target mdi 3:2,rst=7 ... connect to VPEO, and reinitialise the target (gdb) load ... transfer the program to the target's memory (gdb) break function ... set a software breakpoint on a function (gdb) run ... start execution ``` Note the rst=7 option when connecting to VPE0. That tells GDB to reset the target CPU just after establishing the connection. Execution is then resumed and the target is allowed to run freely for seven seconds, after which it is halted. The intent is to let the firmware (e.g. YAMON) initialize board resources, in particular the caches and memory controller, so that the target can accept a program image. Note that this will reset the whole board and CPU, not just the selected VPE, so it would not make sense to use this option when connecting to VPE1. Depending on your setup, to load a large program into memory you may either use the probe via the GDB load command, or it may be faster to use the system's firmware. For the latter, and a board like the Malta, that would be YAMON (see the YAMON manual for how to do this); for other systems it would be system-specific. Sometimes you may be may be debugging the firmware itself. An example session may look like this: ``` $ mips-sde-elf-gdb program ... start GDB and load program's symbol table (gdb) target mdi 3:2 ... connect to VPEO - VPEO is halted (gdb) break function ... set a software breakpoint on a function (gdb) continue ... resume execution of already loaded program/firmware ``` If it's the firmware being debugged, it may sometimes be useful to start over from the reset vector. For this, the "mdireset" command may be useful - this resets the target system entirely. The CPU will exit the reset state and stop before fetching the first instruction from the memory location at the reset vector. You normally really want to issue this command from a debugger connected to VPE0 as VPE1 will become inactive as a result. When a debugging session is terminated, the VPE can either be left halted or execution may be resumed, therefore letting code that has been *previously* debugged run freely. Use the following commands to control that behaviour: ``` (gdb) set mdi continueonclose on ... to resume execution (gdb) set mdi continueonclose off ... to keep the target halted (gdb) show mdi continueonclose ... to retrieve the current setting ``` ## 3.2.2.2 Multiple VPEs on the MIPSsim<sup>™</sup> Simulator The rest of this section assumes that you have installed, configured and selected the MIPSsim simulator as your current MDI target, as described in Section 3.1.1 "MDI Debugging with the MIPSsim<sup>TM</sup> Simulator". For reliable multi-VPE debugging it is recommended that you use version 4.6.36 or higher of the MIPSsim software. MIPSsim provides two ways of working with multiple VPEs. One uses a single MDI device to access the whole core: all thread contexts are accessible through a single connection to the device, regardless of the VPE to which they are bound. The other way uses a pair of separate MDI devices where each has access only to thread contexts bound to the corresponding VPE. The second method requires an auxiliary program called **mipssimd** that controls internal communication between the two MDI devices - this tool is supplied as part of the MIPSsim package, but is not currently available for Windows hosts. #### Setting up mipssimd Working with **mipssimd** requires additional settings to be present in environment variables. They are necessary for the program to create identifiable communication channels with clients connecting to VPE 0 and VPE 1 of the same simulated processor. System V IPC is used. The variables are as follows: ``` $MIPS_MDI_IPC_KEY ``` defines a file to be used as a key to identify this particular instance of a simulated processor. The file has to exist and be accessible. This variable is also used by GDB to select which instance of **mipssimd** to communicate with. ``` $MIPS_MDI_IPC_CLIENTS ``` defines the number of clients to be handled. For the 34K family this has to be set to "2" for the 2 VPEs the processor implements. ``` $MIPS_MDI_IPC_CLIENT_ID ``` defines the number of the communication channel to use between the debugger and a single instance of **mipssimd**, starting from '0'. For the 34K this can be either "0" or '1'. This variable is only used by GDB, and each instance of GDB should have a different value. With \$MIPS\_MDI\_IPC\_KEY and \$MIPS\_MDI\_IPC\_CLIENTS set up you should be able to start **mipssimd**. But before that, it's generally a good idea to clean up any leftover state in IPC resources that may have been left from previous **mipssimd** runs. There is a dedicated program included with MIPSsim that does that. To run it, enter the "mdiipcwatchdog cleanup" command. You should get output like below (obviously the path to the key file will differ, depending on the value of \$MIPS\_MDI\_IPC\_KEY, as may the key and the seed). The following example assumes a Bourne-style shell, for a C shell use the seteny command. ``` $ touch /home/joe/.MIPS_MDI_IPC_KEY $ export MIPS_MDI_IPC_KEY=/home/joe/.MIPS_MDI_IPC_KEY $ export MIPS_MDI_IPC_CLIENTS=2 $ export MIPS_MDI_IPC_CLIENT_ID=0 $ mdiipcwatchdog cleanup Destroying shared memory and semaphores. Generated key '0x1157340' using key string /home/joe/.MIPS_MDI_IPC_KEY' and key seed 0x1 ``` This cleanup step is not required before running **mipssimd** for the first time, but as a side effect it also validates the setup, so running it anyway is a sensible idea. Now to actually run mipssimd, you should see output as follows (again, the path to the key file will likely differ): ``` $ mipssimd -p -f Starting up..... Establishing connection with debuggers using key /home/joe/.MIPS_MDI_IPC_KEY'. Support up to 2 debugging clients Ready to handle IPC commands from debugger #0. Ready to handle IPC commands from debugger #1. ``` If this is works, then you are ready to start working with **mipssimd**. The options given to **mipssimd** above have the following meaning: - -p stands for "persistent" and makes **mipssimd** preserve the state of the simulated system between MDI connections - -f stands for "forever" and makes **mipssimd** keep running even when the last client disconnects. The result is to make **mipssimd** behave like a real h/w CPU, allowing multiple debugger connections to be opened and closed, until it is terminated. Once you finish debugging, you may terminate **mipssimd** by sending it the SIG-INT signal. It's done in a system-specific way, usually by typing <Ctrl>+<C>, also written as ^C - run "stty - a" and see the entry marked intr= for what character is used in a given system - or by using the shell's kill command. With the latter, bear in mind **mipssimd** is multithreaded and all threads must be terminated. #### **General VPE Debugging with Simulator** MIPSsim provides two target group numbers for the 34K - number 21 is for the instruction-accurate simulator and number 22 is for the cycle-counting version. Within each of the groups six devices are defined as follows: | 1 | Whole CPU, little-endian | |---|--------------------------| | 2 | Whole CPU, big-endian | | 3 | VPE0, little-endian | | 4 | VPE0, big-endian | | 5 | VPE1, little-endian | | 6 | VPE1, big-endian | Thus to attach to VPE0 of a big-endian, cycle-counting 34K you need to use the "target mdi" command and select the group number 22 and then the device number 4 interactively or alternatively use "target mdi 22:4". Similarly for a whole CPU access to a little-endian, instruction-accurate 34K you may either select the group number 21 and then the device number 1 or use "target mdi 21:1". GDB may be used to load a program to be debugged to the target. A typical session in this case is going to include the following commands in the given order: ``` $ mips-sde-elf-gdb program ... start GDB and load program's symbol table (gdb) target mdi 21:2 ... connect to whole 34K, instruction accurate, big-endian (gdb) load ... transfer the program to the target's memory (gdb) break function ... set a software breakpoint on a function (gdb) run ``` ... start execution Normally GDB generates a MIPSsim configuration file on the fly from a template (installed as <install\_top\_dir>/share/mipssim.cfg) and uses this whenever a target is opened. If the default settings are unsuitable, then a custom configuration file may be used. Once such a file has been created, use the following command in GDB to use it instead of the default auto-generated file: ``` (gdb) set mdi configfile filename ``` Sometimes it's useful to start debugging a program that has already been loaded into MIPSsim memory; this can done using the APP\_FILE setting in a MIPSsim configuration file. An example session may then look like this: ``` $ mips-sde-elf-gdb program ... start GDB and load program's symbol table (gdb) set mdi configfile myconfigfile ... select custom configuration file (gdb) target mdi 22:1 ... connect to whole 34K, cycle-accurate, little-endian ... the simulator loads the application executable file ... the device is halted (gdb) break function ... set a software breakpoint on a function (gdb) continue ... resume execution under control of GDB ``` ## 3.2.3 Debugging AP/RP Applications The mechanism for debugging a program running on the RP side of an AP/RP system is similar to downloading and running a "bare-iron" program on a target board connected by a serial port or network. It is also possible to debug RP programs using an EJTAG probe. #### 3.2.3.1 Using the SP Debugging Daemon This mechanism allows debugging of an RP program without use of a h/w EJTAG probe. The remote debug connection is via TCP/IP, with the GDB remote debug protocol transported between *mips-sde-elf-gdb* on the development host, through a network server running on the target CPU's AP Linux VPE, and then via a shared memory FIFO to the RP VPE. The following example demonstrates how to debug an RP application running on a Malta board. Debugging an RP application on MIPSsim is not currently possible using this mechanism. 1. Build your application: ``` devhost$ mips-sde-elf-gcc -T <linker script> minimon.c -g -o minimon.x ``` 2. If you haven't already done so, then start the SP debugging daemon on your Malta Linux target: ``` aplinux$ spd & ``` 3. Open a new connection to your Malta (e.g. using telnet, ssh, rlogin, etc) in another terminal window, and start the AP/RP rtterm (real-time terminal) application. This will allow you to communicate with the running RP program's virtual console: ``` aplinux/2$ rtterm ``` 4. Now transfer the minimal program to your Malta board and "download" it into the Realtime Processor by sending it to the /dev/vpe1 device on the Application Processor Linux host, e.g. in your original window: ``` aplinux$ cat minirel >/dev/vpe1 ``` 5. Now start *mips-sde-elf-gdb* (on your development host) and connect it via TCP/IP to the debug server running on the target board: ``` devhost$ mips-sde-elf-gdb minirel (gdb) target remote aplinux:2222 ``` In the above *aplinux* represents the network hostname or IP address of your Malta board. GDB automatically determines the load address of the relocatable program, and relocates its symbol table data to match. - 6. Now you can set breakpoints and enter the GDB c or continue command to start the program running under the control of GDB. Don't use the r or run command to start execution, since this would restart the RP program from its entry point. - 7. Note that the spd remote debugging daemon does not currently support interrupt requests from GDB, so it is not possible to break into a runaway RP application from GDB by typing Control-C or pressing the Insight Stop button. To diagnose such problems you will need to use other techniques such as breakpoints to find the problem; or use an EJTAG probe, which can interrupt any program, whatever its state. ## 3.2.3.2 AP/RP Debugging with EJTAG Probe Refer to Section 3.2.2 "Debugging Multiple VPEs" for general information on using a probe to debug multiple VPEs. The usual method of debugging an AP/RP Linux kernel with an EJTAG probe is to let the firmware load and start Linux and then attach to VPE0 (the Linux AP) which is already running. Similarly for VPE1 (the RP program), except that the Linux VPE loader is used to load and start the program. Since the probe firmware does not know the load address of a relocatable RP program, and cannot tell GDB how to relocate its symbol table, it's usually easier to debug a fully linked RP executable (i.e. an executable called "\*ram" rather than \*rel). If debugging of either AP or RP from the very beginning of the loaded program is required, then hardware execution breakpoints may be placed at the entry point. Use the GDB's hbreak for this. It accepts any syntax that is valid for the break command; in particular absolute numeric addresses may be specified after an asterisk. As the command uses a hardware breakpoint register in the debug port of the core it has to be issued to the correct VPE and will not affect the other VPE. If the RP-side VPE to be debugged is inactive, then there is no way to set a hardware breakpoint since on an attempt to connect GDB will stall, waiting for the VPE to become activated. GDB will pass control to the user as soon as the VPE becomes active and before the first instruction of the program has been executed. ## AP/RP Team Debugging Sometimes when doing debugging it may be desired for the VPEs involved to be stopped and resumed synchronously, so that the state of the target system remains as stable as possible during debug accesses. GDB provides a way of doing that by grouping VPEs, and potentially any devices, into the so called teams. While a single instance of GDB can only fully control one device at a time, including the device in a team with other devices makes requests for stopping and resuming be propagated to all of them. If any of the other devices have instances of GDB attached to them, these requests are transparent to their controlling debuggers. Specifically a device in a team that has been stopped by #### **GDB** Debugging with the MDI interface another debugger, but not the controlling one, stops, but continues reporting the running state to the latter. Likewise a device that has been resumed by the controlling debugger starts reporting the running state, but resumes only after all the other debuggers resumed it. The following commands are used to control teams: ``` target mdi <device>, team=<device>[, team=<device>...] ``` Open the device specified at the beginning and attach it together with ones listed as "team=" arguments to the currently selected team. ``` mdi team attach <device> [<device>...] ``` Attach listed devices to the currently selected team. ``` mdi team detach <device> [<device>...] ``` Detach listed devices from the currently selected team. ``` mdi team clear ``` Destroy the currently selected team removing all members beforehand, the currently selected team is set to "0". ``` mdi team list ``` List identifiers and members of the currently existing teams. ``` set mdi team <id> ``` Select a team identifier for further team operations, "0" means a new team will be created for attachment operations. ``` mdi team <id> Shorthand for "set mdi team <id>". show mdi team ``` Print the identifier of the currently selected team. ``` mdi team ``` Shorthand for "show mdi team". The use of these commands is incompatible with group debugging as described in Section 3.2.4.1 "SMVP/SMTC using MIPSsim® Simulator On MIPSsim". ## AP/RP Debugging with MIPSsim Note that this feature is not supported by the current release of the AP/RP package for TimeSys Linux. One way to debug such a setup is to use a custom MIPSsim configuration file to load and run the AP/RP Linux kernel. It can be used straight from GDB using the "set mdi configfile" command. In such a setup after opening the target, programs as referred to from the configuration files will have been loaded into MIPSsim memory and may be started just by issuing the continue command. Soon you will see Linux kernel messages being output. Depending on whether **mipssimd** is used or not, they will appear through **mipssimd** or GDB's window. This communication channel is actually the Linux console and once the user mode is reached will accept input as well. Similarly with VPE1 (RP), the Linux VPE loader is the usual way of starting the program, rather than loading it through the MDI interface. It's usually easier to use a fully linked executable, as described above for the EJTAG probe. Memory space for loading such an executable has to be reserved in the MIPSsim device configuration file one provided with the Linux AP/RP package may be used as the starting point (see the MIPSsim documentation for anything that is not immediately obvious in the file). Since the simulator returns control to GDB after loading Linux, the kernel may also be debugged from the very beginning as is - rather than issuing continue you may use any commands, like step or break to set up debugging as required. If debugging of the RP program from the very start of the loaded program is required, then a hardware execution breakpoint may be placed at the entry point. Use the hbreak command of GDB for that. If split per-VPE devices and **mipssimd** are used, then hbreak has to be issued to the correct VPE and it will not affect the other one. A connection to the VPE1 device has to be made and the breakpoint be set within, which will trigger as soon as VPE1 executes the instruction there. ## 3.2.4 Debugging SMVP/SMTC Programs #### 3.2.4.1 SMVP/SMTC using MIPSsim® Simulator On MIPSsim the use of the "whole CPU" device to debug shared program image operating systems running across multiple VPEs is recommended - refer to Section 3.2.2.2 "Multiple VPEs on the MIPSsim<sup>TM</sup> Simulator" above. In this case TCs bound to any VPE all become visible and controllable as threads within GDB, as described in Section 3.2.1 "Debugging LLMT Applications" above. All active VPEs also halt and resume execution simultaneously. This is probably what you would expect anyway, when debugging SMP operating systems on a single 34K. #### 3.2.4.2 SMVP/SMTC using FS2 Probe and Group Debugging Group debugging allows synchronous control of multiple devices by a single instance of GDB. All thread contexts of all open devices are seen as threads of a single running program. This is most useful for debugging SMP-style execution environments, though it is not strictly required for each of the devices to execute the same code. Internally the devices are synchronised to one another, that is, events causing one device to stop freeze all the other ones and if GDB decides to return control to the user, then all the threads have their state preserved as of the time of the event, subject to hardware or simulator limitations. The FS2 MDI libraries prior to version 2.1.8.0 do not fully support device synchronization. When using them, GDB still permits doing group debugging, but there is no synchronisation between devices and the state preserved will only be a rough approximation of what the system would look like if a debugger was not attached. This may still be useful for debugging systems which have no strict timing restrictions. Use the following command to debug a group of devices: ``` target mdi <device>, group=<device>[, group=<device>...] ``` Open all the devices requested at once. The use of this command is incompatible with team debugging as described in the Section "AP/RP Team Debugging". Here is an *example* session for SMTC Linux: ``` (gdb) file ./vmlinux Reading symbols from /home/macro/linux/vmlinux...done. (gdb) target mdi 2:2,group=2:1,rst=0 Selected device mips-dual-cores-mips2 on MIPS unknown Selected device mips-dual-cores-mips1 on MIPS unknown [New Thread Context 2:2:0] Connected to MDI target 0xbfc00000 in ?? () (gdb) continue Continuing. [Here Linux is started from YAMON.] ^C Quit received: Stopping target [New Thread Context 2:2:0] [New Thread Context 2:2:1] [New Thread Context 2:2:2] [New Thread Context 2:1:3] [New Thread Context 2:1:4] Program received signal SIGINT, Interrupt. [Switching to Thread Context 2:1:3] 0x80101e6c in r4k_wait () at arch/mips/kernel/cpu-probe.c:48 __asm__(".set\tmips3\n\t" (gdb) info threads 5 Thread Context 2:1:4 0x80101e6c in r4k_wait () at arch/mips/kernel/cpu-probe.c:48 4 Thread Context 2:1:3 0x80101e6c in r4k wait () at arch/mips/kernel/cpu-probe.c:48 3 Thread Context 2:2:2 r4k_wait () at arch/mips/kernel/cpu-probe.c:48 2 Thread Context 2:2:1 r4k_wait () at arch/mips/kernel/cpu-probe.c:48 1 Thread Context 2:2:0 0x8035a5f4 in _spin_unlock_irqrestore (lock=0x80407774, flags=1024) at kernel/spinlock.c:284 ``` Notice how device numbers are reported prefixing thread context numbers above. # 3.3 Debugging with the GNU Simulator You can debug a program using the GNU MIPS simulator which is built into *mips-sde-elf-gdb*. It works very like any other remote debug mechanisms - in fact internally it looks to *gdb* like a remote board. As supplied the GNU simulator does not simulate i/o devices<sup>1</sup>, just a bare MIPS architecture CPU, RAM and a set of PROM monitor entry points. So you can't use the GNU simulator to run programs built for a real hardware target like a Malta board - you must build your programs specifically for the GNU simulator target, e.g. SBD=GSIM32B. You can see your program's output in gdb's console window. If your program attempts to read from its console, then you can input text through gdb's console window when you see the app> prompt. Your program can also read and write files on your host computer. # 3.4 Remote Serial Port Debugging If you've got a MIPS Technologies evaluation board such as the Malta or SEAD-2 boards, but you haven't got an EJTAG probe, then you'll probably be debugging your programs using a remote debug protocol over the serial port. <sup>1.</sup> Actually, if you are brave, then it is possible to add device models to the GNU simulator by editing the source. You also might need to use serial debugging in other cases, such as when you need to debug a multi-threaded application or RTOS, which requires a debug protocol that can handle software thread contexts - for example MDI can provide access to low-level hardware TCs on a multi-threaded CPU (see Section 3.2 "Debugging with MIPS® MT ASE"), but does not know how to find or interpret the state of a software thread which is not currently assigned to a hardware TC. #### **GDB Serial Ports** When you connect to a target using a serial (RS232) port, you have to tell gdb the name of the port device to use. In the examples which follow we've chosen to use the Linux device name /dev/ttyS0, but this is operating system specific, and you'll have to use different names as appropriate for you host. Table 3.2 gives a list of possible names for different operating systems. Host Device Names Linux /dev/ttyS0, /dev/ttyS1 Windows /dev/com1, /dev/com2 Solaris /dev/ttya, /dev/ttyb Table 3.2 Host O/S Serial Port Devices #### **GDB Serial Protocols** There are several different ways that a MIPS program can be debugged remotely, and the distinction often causes confusion. - 1. 1) Using the default *gdb* serial remote debug protocol, support for which is built into the YAMON monitor on MIPS Technologies boards, or - 2. Using the historical MIPS Computer Systems remote debug protocol, as implemented in some PROM monitors (e.g. IDT/sim and PMON). But this mechanism is no longer documented in this manual. It is a completely different debug protocol, and requires different commands to get it started. The amount of data passed back and forth between the board and gdb means that some operations can be quite slow at 38400 baud (the YAMON monitor's default speed). You can use mips-sde-elf-gdb's -b option, its "set remotebaud" command, or the Target Selection dialog in the GUI, to raise the serial line speed to 57600 or 115200 baud, if the target board can handle it. Where the host/target link is slow it's quicker to set gdb temporary breakpoints (the tbreak command) and then continue, rather than doing repeated step commands. You can also speed things up by enabling gdb's memory transfer cache using the "set remotecache" command, but don't do that if you plan to use gdb to access device registers or shared memory. ## 3.4.1 Serial Debugging with the YAMON™ Monitor The YAMON PROM monitor supplied on MIPS Technologies' Atlas, Malta and SEAD-2 boards implements *gdb*'s default remote debug protocol. The YAMON *gdb* protocol is hardwired to use the board's second serial port (tty1), so you will usually need two serial connections between the host and the board: one connected to a terminal emulator for the console, and one used by *gdb* for the remote debug protocol. The YAMON monitor runs its serial ports at a default 38400 baud, and in some cases (slow FPGA-based cores) may require hardware flow-control to avoid UART receive buffer overruns. This can be enabled by *gdb*'s set remoteflow command, or using the h/w flow control tickbox in the GUI's "File->Target\Settings..." dialog. ## 3.4.1.1 YAMON<sup>TM</sup> Monitor - Serial Download Follow this example to load a program xxxram over a serial port to a board running the YAMON monitor (e.g. built with SBD=MALTA32L). | Host System | | | | | | |---------------------|----------------|--|--|--|--| | \$ mips-sde-elf-gdb | xxxram | | | | | | (gdb) set remote | baud 38400 | | | | | | (gdb) set remote | flow on | | | | | | (gdb) <b>b main</b> | | | | | | | | | | | | | | | | | | | | | | | | | | | | (gdb) target rem | ote /dev/ttyS0 | | | | | | (gdb) <b>load</b> | | | | | | | (gdb) cont | | | | | | #### 3.4.1.2 YAMON Monitor - TFTP Download If you have an Ethernet connection to your board and a TFTP server on your host, then you can avoid a long serial download by downloading your program over Ethernet with the YAMON monitor's load command, and then starting *gdb* as follows: # YAMON> load tftp://192..168.1.1/xxxram.s3 To simplify this further you could set the YAMON \$start environment variable to run the YAMON load and gdb commands after every reset. ## 3.4.2 Serial Comms Fault Finding If your target board is not quite capable of keeping up with the data rate from the host (which can happen if your UART doesn't have a FIFO), or if some error is occurring in the remote debug protocol code, then *mips-sde-elf-gdb* may run very slowly, or mysteriously time-out the connection. If this happens then you should try switching on serial port logging in *gdb* before issuing the target command, and then repeat whatever commands cause the problem, e.g. ``` (gdb) set remotelogfile log.txt ``` When you close the target connection, the named file will contain a trace of all data sent and received by gdb. You can also try ``` (gdb) set debug remote 1 ``` which tells the higher-level remote protocol code to output debug information about its activity. With the YAMON monitor you can ask the remote end to output a debug protocol log to the console, by starting it up with the **-v** flag, like this: ``` YAMON> gdb -v ``` The debug trace information is naturally somewhat cryptic if you are not familiar with the protocols, but you may be able to identify dropped characters or other problems. If you need to contact us with a debug comms problem, then it will be helpful if you can email the trace information to us. ## 3.5 Debugging C++ Works as advertised in the GDB manual, so long as you use the default DWARF-2. The DWARF-1 format does not support C++. # **Manual Downloading** After the linker has generated an executable object file you may want to download it manually to a PROM programmer, or an evaluation board. ## 4.1 Evaluation Board Download Usually you'll download your code using *mips-sde-elf-gdb* as part of a debugging session, as described in the previous chapter. But sometimes you might need to download your program manually. There are usually two steps: - While some evaluation boards have an Ethernet interface which allows them to load object files directly at very high speed, most others require that the object file is first converted into some other format (ASCII or encoded binary). The *mips-sde-elf-conv* program performs the task of converting an executable object file into a number of different formats, including: Motorola S-records, LSI Logic PMON fast format, IDT/sim binary S-records, and Stag PROM programmer binary format. See [Conv] for full option details. - 2. Finally you can can perform the download via a serial or parallel port. It may also be possible to use the download features of your favourite terminal emulator, for which consult your board manual. Note that when you download to an evaluation board, you will usually want the program and its data to be loaded at the load addresses assigned by the linker, so **DO NOT** use *mips-sde-elf-conv*'s **-p** (prom) option to create your downloadable file: this is what the example makefiles will do when building the ram and standalone versions of a program, as opposed to the rommable version. The actual process of downloading to an evaluation board is highly dependent on the board and its PROM monitor. # 4.2 PROM Programmer Download The other situation when manual downloading is required is when blowing a PROM. In this case it is usually necessary for the code and data to be relocated from their linker-assigned addresses into offsets from the start of the ROM. The ROM startup code will then relocate the initialised data, and possibly the code too, from ROM to RAM. The *mips-sde-elf-conv* **-p** (prom) option helps with this. It ensures that ROM resident code and read-only data is placed at its correct offset in the ROM image, and then places the initialised, writable data segment at the next 16-byte boundary following. *mips-sde-elf-conv* also contains facilities for splitting an object file into horizontal and/or vertical slices, including interleaving, to accommodate dumb programmers (the machines, not the people!). The physical process of downloading to the PROM programmer is device-dependent. You should refer to your PROM programmer's manual for instructions. ## 4.2.1 Other Techniques Downloading large programs via a serial port is very slow and tedious. There is no reason why a faster technique cannot be used for downloading the program, and you may want to use some other high-speed mechanism on your own board (e.g. a Centronics parallel interface, a PCI bus, USB, or whatever). To help with this process you may want to examine the sources of *convert* (aka *mips-sde-elf-conv*) programs in the source code tarball. # Porting an ISO / ANSI C Program This chapter is intended to help you port an existing C application or benchmark program that is compatible with the C library defined by the ISO C90 or ANSI X3J11 standard. Most simple, self-contained programs will port with no difficulty. ## 5.1 Common Problems When Converting to MIPS® Architecture These remaining points are general warnings about idiosyncrasies of the MIPS architecture and its compilers, which can cause confusion when porting programs. - *Unaligned addresses*: Will cause an "Address Error" exception (a **SIGBUS** signal). This won't affect most programs since the compiler correctly aligns structure fields unless specifically instructed otherwise. The malloc() family also aligns all requests to an 8-byte boundary (the maximum ever required by the CPU). But beware when type-casting pointers to small types into pointers to larger types (you can try using the compiler's -Wcast-align option to catch these). - *Null pointer references*: will cause a "TLB Miss" exception (a **SIGSEGV** signal), unless you set up a dummy TLB mapping for address 0. Memory is normally accessed through the cacheable KSEG0 or uncacheable KSEG1 address spaces, which begin at 0x80000000 and 0xa00000000 respectively. - *Use of "short" variables*: often prevalent in programs written for 16-bit or x86 processors, generates inefficient code on MIPS architecture processors, particularly if used for for loop counters and array indices. There are no MIPS instructions which operate on sub 32-bit values, and they have to be synthesised from multiple instructions. Although the compiler attempts to avoid excessive conversions, always use *int* for such purposes, unless you specifically need the semantics of 16-bit arithmetic. - Character signedness: ISO and ANSI C permits "char" variables to be implemented as either signed or unsigned it's compiler dependent. MIPS compilers historically made "char" variables default to unsigned (because it makes faster code); if your program has been developed in a context where those variables were signed, it may not work correctly on MIPS; you may get caught out by mistakes like assigning the integer result of getc() to a char variable, and then comparing that with EOF(integer -1). You can specify "signed char" explicitly for individual variables, which will make your code more portable. But if it is deeply ingrained in your application, then you can use the compiler's **-fsigned-char** option, which changes the default. - Bitfield signedness: Some compilers arbitrarily treat bitfields as implicitly unsigned, but this is not the case for GCC, which uses your type definition as written. But accessing signed bitfields generates slower code, especially when using the MIPS16 ASE. You can either modify your structure definitions to add explict "unsigned" type qualifiers, or change GCC's default behaviour using its -funsigned-bitfields option. - Small variables of 8 bytes or less: They are stored separately from larger variables, to allow them to be accessed more quickly. This can cause strange link-time errors if you have not declared your global variables consistently in all modules ("relocation truncated" is the usual one). | 5.1 Common Problems When Converting to MIPS® Architecture | Э | |-----------------------------------------------------------|---| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # **MIPS® Architecture Intrinsics** The MIPS architecture includes a number of instructions and registers that can't be accessed directly by C and C++ code. The bare-iron/ELF library includes a set of *intrinsics*, which provide access to these special-purpose instructions. They are often implemented in header files, using *gcc* inline *asms* - which means that you can read, modify and reuse them for your own purposes. This chapter describes only application-level MIPS intrinsics - for intrinsics which access a CPU's "system" facilities, see Section 7.4 "System Coprocessor (CP0) Intrinsics". ## 6.1 Intrinsics for Byte Swapping Include the header file *mips/endian.h>* to define the inline functions listed below. On a MIPS32 Release 2 CPU, they will generate a fast, two-instruction sequence; on other MIPS ISAs, they will generate a longer sequence of shifts, ands, and ors. They are also smart enough to byte-swap constants at compile time. ``` uint32_t htobe32(uint32_t val) ``` Convert the 32-bit value val from "host" byte order to big-endian byte order (this will be a no-op on a big-endian CPU). ``` uint16_t htobe16(uint16_t val) ``` Convert the 16-bit value val to big-endian format. ``` uint32_t betoh32(uint32_t val) ``` Convert 32-bit big-endian value val to the "host" byte order (this will be a no-op on a big-endian CPU). ``` uint16_t betoh16(uint16_t val) ``` Convert 16-bit big-endian value val to the "host" byte order. ``` uint32_t htole32(uint32_t val) uint16_t htole16(uint16_t val) uint32_t letoh32(uint32_t val) uint16_t letoh16(uint16_t val) ``` As above, but converting to and from little-endian. ## 6.2 Intrinsics for MIPS32® Architecture The MIPS32 and MIPS64 instruction set architectures include the count-leading-zeroes and count-leading-ones instructions. The bare-iron/ELF library provides this C interface, implemented by inline *asms* on MIPS32 and MIPS64 CPUs, or as a subroutine call on older MIPS architectures. To use these functions include the header file <*mips/mips32.h*>. ``` uint32_t mips_clz(uint32_t val) ``` The 32-bit argument val is scanned from most-significant to least-significant bit, and the number of leading zeros is returned. If no bits were set, the value 32 is returned. ``` uint32_t mips_clo(uint32_t val) ``` The 32-bit argument val is scanned from most-significant to least-significant bit, and the number of leading ones is returned. If all bits were set, the value 32 is returned. ``` uint32_t mips_dclz(uint64_t val) ``` The 64-bit argument val is scanned from most-significant to least-significant bit, and the number of leading zeros is returned. If no bits were set, the value 64 is returned. ``` uint32_t mips_dclo(uint64_t val) ``` The 64-bit argument val is scanned from most-significant to least-significant bit, and the number of leading ones is returned. If all bits were set, the value 64 is returned. ## 6.3 Intrinsics for MIPS32® Release 2 Architecture The MIPS32 Release 2 ISA introduces a number of new user-level instructions. Some of them will be happily used by the compiler to optimize normal C code. But some of the byte- and bit-shuffling instructions are not available for normal C code, so these intrinsics are made available by including <mips/mips32.h>: ``` uint32_t _mips32r2_bswapw(uint32_t int val) ``` Byte swap the 32-bit value val, a two-instructions sequence. It is normally more efficient to use the intrinsics described in Section 6.1 "Intrinsics for Byte Swapping". ``` uint32_t _mips32r2_wsbh(uint32_t val) ``` Return the result of the MIPS64 Release 2 wsbh instruction given val. ``` uint32_t _mips32r2_ins(uint32_t tgt, uint32_t val, uint32_t pos, uint32_t sz) ``` Return the result of a 32-bit insert bit field instruction, inserting sz bits of val into tgt, at bit position pos. Both pos and sz must be constants. ``` uint32_t _mips32r2_ext(uint32_t x, uint32_t pos, uint32_t sz) ``` Return the result of a 32-bit unsigned extract bit field instruction, returning sz bits, from bit position pos, of x. Both pos and sz must be constants. ## 6.4 Intrinsics for MIPS64® Release 2 Architecture TheMIPS64 ISA inherits the MIPS32 instructions and their intrinsics, but (as one would expect) adds some 64-bit equivalents: ``` uint64_t _mips64r2_bswapd(uint64_t val) ``` Byte swap the 64-bit value val, a two-instructions sequence. ``` uint64_t _mips64r2_dsbh(uint64_t val) ``` Return the result of the MIPS64 Release 2 dsbh instruction given val. ``` uint64_t _mips64r2_dshd(uint64_t val) ``` Return the result of the MIPS64 Release 2 dshd instruction given val. ``` uint64_t _mips64r2_dins(uint64_t tgt, uint64_t val, uint32_t pos, uint32_t sz) ``` Return the result of a 64-bit insert bit field instruction, inserting sz bits of val into tgt, at bit position pos. Both pos and sz must be constants. ``` uint64_t _mips64r2_dext(uint64_t x, uint64_t pos, uint32_t sz) ``` Return the result of a 64-bit unsigned extract bit field instruction, returning sz bits, from bit position pos, of x. Both pos and sz must be constants. # 6.5 Intrinsics for CorExtend<sup>TM</sup> Extension MIPS Technologies' Pro Series<sup>TM</sup> CPU cores include the CorExtend feature, which extends the instruction set by adding a small number of user-definable instructions (UDIs). The Pro Series cores then provide an on-chip interface which allows a customer building an SoC to add just the logic to implement their chosen instructions; the interface to the CPU pipeline and its general-purpose registers is provided by the core. The UDI instructions commonly have the standard MIPS three-operand format, where they can use two registers as source operands and one as destination. The two source registers are decoded inside the CPU core, and sent to the customer's UDI block, and so they can only be encoded in the standard position. The register number to which to write the result is selected by the UDI block, so in principle can be any CPU register or none, including one of the source registers; but it would be eccentric and unhelpful to specify a separate destination register and not use the standard MIPS format to do it. Instructions which don't use all the possible general purpose registers can recycle the register fields for other purposes. The assembler interface to UDI provides you with choices about how you construct the instruction: ``` udi IMM : ``` All 24 user-definable bits of the instruction are set by integer IMM, including the register and opcode fields. ``` udiOP IMM : ``` OP is an integer (0 to 15) which defines the UDI opcode, and IMM the remaining 20 user-definable bits. ``` udiOP rs, IMM : ``` OP is the UDI opcode, rs the register number (read-only, or read-write), and IMM the remaining 15 bits. ``` udiOP rs, rt, IMM : ``` rs would conventionally be read-only, but rt read-only or read-write. IMM is the remaining 10 bits. ``` udiOP rs,rt,rd,IMM : ``` rs and rt would conventionally be read-only, and rd write-only, a conventional MIPS three-operand instruction, with IMM defining the remaining 5 bits. If a register field in a UDI instruction isn't a general purpose register, but a register in the UDI block, or extra opcode bits, then use the \$n syntax to insert a 5-bit immediate into the field, e.g. udi, \$a0,\$10,\$v0,12. In the bare-iron/ELF library you get an interface to the UDI instructions; you'll need to #include <mips/udi.h>. The GNU compiler can optimize code around the asm() statements used to build this interface; and that's great. But some UDI instructions may alter internal state or registers in the UDI block which aren't visible to the compiler, making those optimizations incorrect. If your UDI instruction generates no state except for what it writes to the CPU destination register, then you can use the "safe" intrinsics, and the optimizer can work its magic. In the description below, OP is the UDI opcode (0 to 15); A and B are any valid C or C++ scalar integer-valued expression, and IMM is a constant to fill the remaining instruction bits. The compiler allocates registers to hold the A and B source operands, and the result register. ``` /* Simple UDI instructiions are assumed to write a result to their final CPU register operand, but may may have other side effects such as using or modifying internal UDI registers, so they won't be optimized by the compiler. */ /* The `ri' single register intrinsic passes A in the RS field, and returns the new RS register. IMM is the remaining 15 bits. */ typeof A mips_udi_ri (OP, A, IMM); /* The 'rwi' two register intrinsic passes A in the RS field, and and returns the new RT register. IMM is the remaining 10 bits. */ typeof A mips_udi_rwi (OP, A, IMM); /* The 'rri' two register intrinsic passes A in RS, B in RT, and and returns the new RT register. IMM is the remaining 10 bits. */ typeof A mips_udi_rri (OP, A, B, IMM); /* The 'rrwi' three register intrinsic passes A in RS, B in RT, and returns the w/o RD register. IMM is the remaining 5 bits. */ typeof A mips_udi_rrwi (OP, A, B, IMM); /* Optimizable intrinsics for UDI instructions which read only the CPU source registers and write to the destination CPU register only, and have no other side effects, i.e. they only use and modify the supplied CPU registers. */ typeof A mips_udi_ri_safe (OP, A, IMM); typeof A mips_udi_rwi_safe (OP, A, IMM); typeof A mips_udi_rri_safe (OP, A, B, IMM); typeof A mips_udi_rrwi_safe (OP, A, B, IMM); ``` ``` /* The mips_udi_i() intrinsics use no register inputs, but return the value written to the RS register (the input value is assumed discarded). */ uint32_t mips_udi_i (OP, IMM); uint64_t mips_udi_i_64 (OP, IMM); /* "NoValue" intrinsics for UDI instructions which don't write a result to a CPU register, so presumably must have some other side effect, such as modifying an internal UDI register. */ void mips_udi_nv (IMM); void mips_udi_i_nv (OP, IMM); void mips_udi_ri_nv (OP, A, IMM); void mips_udi_rri_nv (OP, A, B, IMM); ``` To provide even more flexibility, the following set of intrinsics allow register fields in the UDI instructions to be set to constant 5-bit immediates (0-31), possibly to identify registers inside the UDI block, or as extra opcode bits. The IS, IT and ID arguments below must be constants, which will be inserted into the rs, rt and rt field of the instruction, as appropriate. Arguments A and B will still be computed and assigned to registers by the compiler. UDI instructions are allowed to write to any general purpose register, not just those named in the instruction - so the destination register may be implicit in the opcode. To handle this the GPDEST argument allows the programmer to explicitly specify the general purpose register number that is written, and this prevents the compiler from allocating that register for other variables across the UDI instruction; if no general purpose CPU register is written, pass a GPD-EST of zero. ``` /* These 4 variants of the three register operand format allow constant values to be placed in the RS, RT fields, presumably because they name internal UDI registers. The RD register is still allocated by the compiler. They are implicitly "unsafe" or volatile. */ typeof A mips_udi_riri (OP, A, IT, IMM); typeof B mips_udi_irri (OP, IS, B, IMM); int32_t mips_udi_iiri_32 (OP, IS, IT, IMM); int64_t mips_udi_iiri_64 (OP, IS, IT, IMM); /* These 5 variants of the three register format allow constant values to be placed in the RS, RT and RD fields, presumably because they name internal UDI registers. In case the instruction writes to an implicit gp register, pass the register number as GPDEST and the compiler will be told that it's been clobbered, and its value will be returned - if no gp register is written, pass 0. They are all implicitly unsafe, or volatile. */ typeof A mips_udi_rrii (OP, A, B, ID, IMM, GPDEST); typeof A mips_udi_riii (OP, A, IT, ID, IMM, GPDEST); typeof B mips_udi_irii (OP, IS, B, ID, IMM, GPDEST); int32_t mips_udi_iiii_32 (OP, IS, IT, ID, IMM, GPDEST); int64_t mips_udi_iiii_64 (OP, IS, IT, ID, IMM, GPDEST); ``` **Warning:** The compiler assumes that all *asm* inputs are "word sized", i.e. that the inputs have the same precision as the underlying register size, and it may emit instructions to sign- or zero-extend any inputs which are smaller than that (e.g. char and short operands). To avoid an excessive number of these extension instructions you should try to ensure that you always pass "word sized" values to these intrinsics. **Warning 2**: The GCC asm statement does not allow you to use aggregate values (a *struct*, *union* or *array*) as inputs or output for an asm - you may only pass simple scalar values. If you need to pass aggregate values to or from a UDI instruction, then you must define a union to smuggle them through. For example: ``` /* object manipulated by UDI hardware */ typedef struct { ``` ``` uint16_t imag; uint16_t real; } complex_t; /* access mechanism for UDI intrinsics */ typedef union { complex_t c; uint32_t w; } udicomplex_t; /* add two complex types using three operand UDI instruction */ extern inline complex_t do_ADD (const complex_t *a, const complex_t *b) const udicomplex_t *ua = (udicomplex_t *) a; const udicomplex_t *ub = (udicomplex_t *) b; udicomplex_t uv; uv.w = mips_udi_rrwi_safe (ADD _OP ODE, ua->w, ub->w, 0); return uv.c; } ``` ## 6.6 Intrinsics for COP2 Extension Some MIPS Technologies CPU cores allow an SoC builder to design a tightly-coupled coprocessor which implements the COP2 instructions. These instructions are a part of the MIPS32 and MIPS64 ISAs reserved for use only by coprocessors. For the C interface to these instructions you must #include <mips/cop2.h>, which defines the following intrinsics: ``` void mips_lwc2 (C2REG, MEM); ``` Load the 32-bit word in memory referenced by MEM into COP2 data register C2REG (constant 0-31). The form of MEM is basically a 32-bit value obtained through a pointer, as in: ``` int *a; mips_lwc2 (3, *a) ``` It's there so you can load a memory value directly into a COP2 register without loading it first into a general-purpose register. ``` void mips_swc2 (C2DREG, MEM); ``` The opposite - store COP2 data register C2REG to a memory location. ``` void mips_ldc2 (C2DREG, MEM); void mips_sdc2 (C2DREG, MEM); ``` 64-bit load/store respectively. Particularly important if your CPU has only 32-bit general purpose registers. ``` void mips_mtc2 (VAL, C2DREG, SEL); ``` Write any 32-bit expression VAL to COP2 register C2DREG in register bank SEL. ``` uint32 t mips mfc2 (C2DREG, SEL); ``` ``` Return the 32-bit COP2 register C2DREG/SEL. void mips_dmtc2 (VAL, C2DREG, SEL); uint64_t mips_dmfc2 (C2DREG, SEL); 64-bit versions of the above. void mips_ctc2 (VAL, C2CREG); Write any 32-bit C expression VAL to COP2 control register C2CREG. uint32_t mips_cfc2 (C2CREG); Return the 32-bit COP2 control register C2CREG. void mips_cop2 (OP); Emit arbitrary coprocessor 2 instruction with "undefined" bits set by constant integer OP. int mips_c2t (CC); Returns one if coprocessor 2 condition bit CC (0-7) is "true", zero otherwise. int mips_c2f (CC); ``` Returns one if coprocessor 2 condition bit CC is "false", zero otherwise. ## 6.7 Intrinsics for SmartMIPS® ASE MIPS Technologies' 4KSc and 4KSd CPU cores implement the SmartMIPS ASE (application specific extension) to the base MIPS32 instruction set. The bit-rotate and indexed load instructions will be used automatically by the compiler when you use the **-msmartmips** compiler option. The other new instructions may be used from C code by using the intrinsics defined by #include <mips/smartmips.h>, as follows: ``` int mips_multp (int a, int b) ``` Return the low 32-bit result of the polynomial-basis multiplication of the two, 32-bit binary polynomial arguments a and b. ``` int mips_maddp (int acc, int a, int b) ``` Return the low 32-bit result of the polynomial-basis multiplication of arguments a and b, polynomially added to acc. This can be used with mips\_multp to construct a polynomial multiply-add loop which can be optimized by the compiler. For example: ``` int maddp_arr (int *arr, int narr, int factor) { int acc, i; acc = mips_multp (arr[0], factor); for (i = 1; i < narr; i++) acc = mips_maddp (acc, arr[i], factor);</pre> ``` ``` return acc; } int mips_maddp2 (int a, int b) ``` Like mips\_maddp, but assumes that you've already loaded the accumulator (the LO register) in some other way that is not visible to the compiler. ``` long long mips_multpx (int a, int b) long long mips_maddpx (long long acc, int a, int b) long long mips_maddp2x (int a, int b) ``` Like mips\_multp etc, but operating on the full 64-bit multiplier result, i.e. the HI, LO register pair. ``` int mips mfxu (void) ``` Return the extra high-order bits (bits 64 and upwards) of the multiply accumulator register (the new SmartMIPS ACX register). This is destructive of the accumulator, so use with care. ``` int mips_mfhu (void) ``` Return bits 32-63 of the multiply accumulator (the HI register). This is destructive. ``` int mips_mflhxu (int acc, int &lo) ``` Stores the low 32-bits of the multiply accumulator in acc into the lvalue "reference" argument 10, and then shifts the multiply accumulator right by 32-bits, returning the shifted accumulator. For example: ``` unsigned int mpmadd (unsigned int *arr, unsigned int *spill, int narr, int factor) { unsigned int acc = 0; int i, j; for (i = j = 0; i < narr; i += 4, j++) { acc += arr[i+0] * factor; acc += arr[i+1] * factor; acc += arr[i+2] * factor; acc += arr[i+3] * factor; acc = mips_mflhxu (acc, spill[j]); } return acc; }</pre> ``` Like mips\_mflhxu etc, but operating on the full 64-bit multiplier result, i.e. the HI, LO register pair. ``` void mips_mtlhx (int lo, int hi, int ex) ``` long long mips\_mflhxux (long long acc, int &lo) Moves the three 32-bit values in arguments 10, hi, and ex to the multiplier result registers (LO, HI and ACX). ``` void mips_pperm (int src, int sel) ``` Shift the 96-bit (max) extended multiplier result registers 6 bits left, and mix in 6 bits of src, permuted by sel. See the SmartMIPS **pperm** instruction definition for details. ## 6.8 Intrinsics for Paired-single/MIPS-3D® Architecture This version of GCC includes support in the compiler for the paired-single SIMD floating point data type and instructions, and the MIPS-3D ASE. Full details of the vector data types and intrinsics can be found in the *Target Builtins* section of the [Gcc] Reference Manual. ## 6.9 Intrinsics for MIPS MT ASE The new instructions introduced by the MIPS MT ASE may be accessed from code using the intrinsics defined by #include <mips/mt.h>, as follows: ``` unsigned int mips_mt_fork (void *addr, unsigned int pv, unsigned int cv) ``` Fork to addr, returning pv to parent and cv to child. ``` unsigned int mips_mt_yield (unsigned int yq) ``` Yield with qualifier yq, returning active signals. ``` int mips_mt_dmt (void) ``` Disable MT, returning old enable state. ``` int mips_mt_emt (void) ``` Enable MT, returning old enable state. ``` int mips_mt_dvpe (void) ``` Disable multi-VPE mode, returning old enable state. ``` int mips_mt_evpe (void) ``` Enable multi-VPE mode, returning old enable state. Other functions in this header file provide access to the new Coprocessor 0 registers provided by the MT ASE, and to registers within other thread contexts. See Section 7.4 "System Coprocessor (CP0) Intrinsics" for a listing. ## 6.10 Intrinsics for MIPS DSP ASE The MIPS DSP ASE defines a set of new instructions to improve the performance of DSP and "Media" applications. Many of these new DSP instructions operate on Q15 or Q31 fractional data. Q31 is a 32-bit fixed-point fraction which can represent numbers between -1 and very nearly 1, and Q15 is a similar 16-bit fraction. The DSP ASE's favorite 8-bit quantity is an unsigned fraction representing numbers between 0 and 255/256. Vectors of 4 x unsigned bytes or 2 x Q15 fractions fit into a 32-bit register, and the DSP ASE includes instructions which operate on all members of a vector at once. For detailed information about the MIPS DSP ASE (and a proper description of fractional data types), see the MIPS DSP ASE documentation [MD00374]. Addition and subtraction on fractional data are really the same as addition and subtraction with unsigned integer data, but multiplication requires a post-multiply shift to align the resulting values appropriately. The new multiply instructions in the DSP ASE that operate on fractional data provide this shift operation. We do not (yet) have a compiler which knows about fractions. Q15 is an alias for a signed 16-bit integer (short), and Q31 is an alias for a signed 32-bit integer (int). This document describes some new vector data types and built-in intrinsic functions available under the GNU C compiler. Each instruction in the DSP ASE has its own intrinsic, so you can write anything in C. To tell GCC to compile for a CPU with DSP ASE support, pass the compiler the **-mdsp** flag. ## 6.10.1 Vector Data Types Some typedefs: ``` typedef v4q7 __attribute__ ((mode(V4QI))); typedef v2q15 __attribute__ ((mode(V2HI))); typedef v4i8 __attribute__ ((mode(V4QI))); typedef v2i16 __attribute__ ((mode(V2HI))); v2i16 a vector of two 16-bit integers. v4i8 a vector of four 8-bit integers. v4q7 a vector of four Q7 fractions. v2q15 a vector of two Q15 fractions. You can initialize vectors like this: v4i8 a = \{1, 2, 3, 4\}; v4i8 b; b = (v4i8) \{5, 6, 7, 8\}; v2q15 a = {0x0fcb, 0x3a75}; ``` **Caution:** When the compiler lets you see inside vectors and other packed data, you see the components in the order they occupy in memory when you store the vector. But instructions in the DSP ASE locate vector subcomponents with reference to register bit-numbers. The relationship between bit-numbers and memory addresses changes with the CPU's endianness; so initializers like this are endianness-dependent. If you're big-endian, then at the C level you'll see the high-bit-number components first - the DSP ASE refers to these as *left* and uses an 1 (letter "l", that is) in instruction names. If you're little-endian, then at the C level you'll see the lower-bit-numbered components first - what the DSP ASE calls right using an r in the instruction name. When little-endian, in fact, the one on the left is on the right: perhaps it's better to use a line break between the elements! To initialize fractional values it's sometimes convenient to do this: ``` v2q15 b; b = (v2q15) {0.1234 * 32768.0, 0.4567 * 32768.0}; ``` The multiplication by 32768.0 effectively pre-shifts the decimal by 15 bits, which is just what you want for a Q15. To initialize a Q31 variable, you need a 31-bit shift, so multiply by 2147483648.0. You can use a union type to access vector components. Again, the relationship between the components named in your union and those seen by the DSP ASE will be endianness-dependent. ``` /* 'v4i8' Example */ typedef union { v4i8 a; char b[4]; } v4i8_union; v4i8 i; char j, k, l, m; v4i8_union temp; /* Assume we want to extract from i. */ temp.a = i; j = temp.b[0]; k = temp.b[1]; 1 = temp.b[2]; m = temp.b[3]; /* Assume we want to assign j, k, l, m to i. */ temp.b[0] = j; temp.b[1] = k; temp.b[2] = 1; temp.b[3] = m; i = temp.a; /* 'v2q15' Example */ typedef union { v2q15 a; q15 b[2]; } v2q15_union; v2q15 i; q15 j, k; v2q15_union temp; /* Assume we want to extract from i. */ temp.a = i; i = temp.b[0]; k = temp.b[1]; /* Assume we want to assign j, k to i. */ ``` ``` temp.b[0] = j; temp.b[1] = k; i = temp.a; ``` ## 6.10.2 Scalar data types ``` #include <stdint.h> typedef int32_t q31; typedef int32_t i32; typedef uint32_t ui32; typedef int64_t a64; ``` is really just an alias for a 32-bit signed integer, but an argument or return value with this type reminds you that the data is being interpreted as a Q31 fraction. Same goes for q15. i32, ui32 are there for C purists, since there's no guarantee that a simple int is 32 bits. a64 is an alias for long long (which for MIPS GCC is a 64-bit signed integer). We use it to remind you that the underlying instruction is using one of the four 64-bit accumulators defined by the DSP ASE (\$ac0, \$ac1, \$ac2, \$ac3). If you're already familiar with the MIPS architecture, note that \$ac0 comprises the bits of the hi/lo registers used in regular MIPS32 multiply/divide instructions. Note that some parameters of builtin function have the following types: ``` imm0_7: ``` the parameter must be a constant in the range 0 to 7. imm0\_15: the parameter must be a constant in the range 0 to 15. imm0\_31: the parameter must be a constant in the range 0 to 31. imm0\_63: the parameter must be a constant in the range 0 to 63. imm0\_255: the parameter must be a constant in the range 0 to 255. imm0\_1023: the parameter must be a constant in the range 0 to 1023. ``` imm1 32: ``` the parameter must be a constant in the range 1 to 32. ``` imm n32 31: ``` the parameter must be a constant in the range -32 to 31. ## 6.10.3 Compiler Builtin Functions The DSP ASE instruction names are full of "." (period) characters, not legal as part of C names. To make C names each period is replaced by "\_" (underscore), and the assembler name prefixed with "\_builtin\_mips\_". So the instruction called addq.ph becomes \_builtin\_mips\_addq\_ph. Note that where there are two variants of an underlying DSP instruction which accept an immediate or variable/register operand, the compiler will automatically pick the correct instruction depending on the type and size of the operand. The instructions are listed in alphabetical order. Spaces have been introduced to separate unlike instructions, but there's no other hint as to what they do. ``` v2q15__builtin_mips_absq_s_ph (v2q15); q31__builtin_mips_absq_s_w (q31); v2q15__builtin_mips_addq_ph (v2q15, v2q15); v2q15_builtin_mips_addq_s_ph (v2q15, v2q15); q31__builtin_mips_addq_s_w (q31, q31); i32 __builtin_mips_addsc (i32, i32); i32 __builtin_mips_addwc (i32, i32); v4i8__builtin_mips_addu_qb (v4i8, v4i8); v4i8__builtin_mips_addu_s_qb (v4i8, v4i8); i32 __builtin_mips_bitrev (i32); i32 __builtin_mips_bposge32 (); void__builtin_mips_cmp_eq_ph (v2q15, v2q15); void__builtin_mips_cmp_le_ph (v2q15, v2q15); void__builtin_mips_cmp_lt_ph (v2q15, v2q15); i32 builtin mips cmpqu eq qb (v4i8, v4i8); i32 __builtin_mips_cmpqu_le_qb (v4i8, v4i8); i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8); void__builtin_mips_cmpu_eq_qb (v4i8, v4i8); void__builtin_mips_cmpu_le_qb (v4i8, v4i8); void__builtin_mips_cmpu_lt_qb (v4i8, v4i8); a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15); a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31); a64__builtin_mips_dpau_h_qbl (a64, v4i8, v4i8); a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8); a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15); a64__builtin_mips_dpsq_sa_l_w (a64, q31, q31); ``` ``` a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8); a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8); i32 __builtin_mips_extp (a64, i32); i32 __builtin_mips_extpdp (a64, i32); i32 __builtin_mips_extr_r_w (a64, i32); i32 __builtin_mips_extr_rs_w (a64, i32); i32 __builtin_mips_extr_s_h (a64, i32); i32 __builtin_mips_extr_w (a64, i32); i32 __builtin_mips_insv (i32, i32); i32 __builtin_mips_lbux (void *, i32); i32 __builtin_mips_lhx (void *, i32); i32 __builtin_mips_lwx (void *, i32); a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15); a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15); _builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15); a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15); i32 __builtin_mips_modsub (i32, i32); a64__builtin_mips_mthlip (a64, i32); q31__builtin_mips_muleq_s_w_phl (v2q15, v2q15); q31__builtin_mips_muleq_s_w_phr (v2q15, v2q15); v2q15__builtin_mips_muleu_s_ph_qbl (v4i8, v2q15); v2q15__builtin_mips_muleu_s_ph_qbr (v4i8, v2q15); v2q15 builtin_mips_mulq_rs_ph (v2q15, v2q15); a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15); v2q15 builtin_mips_packrl_ph (v2q15, v2q15); v2q15 builtin_mips_pick_ph (v2q15, v2q15); v4i8__builtin_mips_pick_qb (v4i8, v4i8); q31__builtin_mips_preceq_w_phl (v2q15); q31__builtin_mips_preceq_w_phr (v2q15); v2q15__builtin_mips_precequ_ph_qbl (v4i8); v2q15__builtin_mips_precequ_ph_qbla (v4i8); v2q15__builtin_mips_precequ_ph_qbr (v4i8); v2q15__builtin_mips_precequ_ph_qbra (v4i8); v2q15__builtin_mips_preceu_ph_qbl (v4i8); v2q15__builtin_mips_preceu_ph_qbla (v4i8); v2q15__builtin_mips_preceu_ph_qbr (v4i8); v2q15__builtin_mips_preceu_ph_qbra (v4i8); v2q15__builtin_mips_precrq_ph_w (q31, q31); v4i8__builtin_mips_precrq_qb_ph (v2q15, v2q15); v2q15__builtin_mips_precrq_rs_ph_w (q31, q31); v4i8_builtin_mips_precrqu_s_qb_ph (v2q15, v2q15); ``` ``` i32 __builtin_mips_raddu_w_qb (v4i8); i32 __builtin_mips_rddsp (imm0_63); v2q15__builtin_mips_repl_ph (i32); v4i8__builtin_mips_repl_qb (i32); a64 __builtin_mips_shilo (a64, i32); v2q15__builtin_mips_shll_ph (v2q15, i32); v4i8__builtin_mips_shll_qb (v4i8, i32); v2q15_builtin_mips_shll_s_ph (v2q15, i32); q31__builtin_mips_shll_s_w (q31, i32); v2q15__builtin_mips_shra_ph (v2q15, i32); v2q15__builtin_mips_shra_r_ph (v2q15, i32); q31__builtin_mips_shra_r_w (q31, i32); v4i8__builtin_mips_shrl_qb (v4i8, i32); v2q15_builtin_mips_subq_ph (v2q15, v2q15); v2q15_builtin_mips_subq_s_ph (v2q15, v2q15); q31 __builtin_mips_subq_s_w (q31, q31); v4i8__builtin_mips_subu_qb (v4i8, v4i8); v4i8__builtin_mips_subu_s_qb (v4i8, v4i8); void__builtin_mips_wrdsp (i32, imm0_63); ``` # 6.10.4 Compiler Builtins for Second Revision The second revision of the DSP ASE introduces some new instructions for which there are equivalent new builtin functions in the compiler. ``` v4q7 __builtin_mips_absq_s_qb (v4q7); v2q15 __builtin_mips_addqh_ph (v2q15, v2q15); v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15); q31 __builtin_mips_addqh_w (q31, q31); q31 __builtin_mips_addqh_r_w (q31, q31); v2i16 __builtin_mips_addu_ph (v2i16, v2i16); v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16); v4i8 __builtin_mips_adduh_qb (v4i8, v4i8); v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8); i32 __builtin_mips_append (i32, i32, imm0_31); i32 __builtin_mips_balign (i32, i32, imm0_3); i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8); i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8); i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8); a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16); a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16); a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15); a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15); ``` ``` a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16); a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16); a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15); a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15); a64 __builtin_mips_madd (a64, i32, i32); a64 builtin mips maddu (a64, ui32, ui32); a64 __builtin_mips_msub (a64, i32, i32); a64 __builtin_mips_msubu (a64, ui32, ui32); v2i16 __builtin_mips_mul_ph (v2i16, v2i16); v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16); q31 __builtin_mips_mulq_rs_w (q31, q31); v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15); q31 __builtin_mips_mulq_s_w (q31, q31); a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16); a64 __builtin_mips_mult (i32, i32); a64 __builtin_mips_multu (ui32, ui32); v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16); v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31); v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31); i32 __builtin_mips_prepend (i32, i32, imm0_31); v4i8 __builtin_mips_shra_qb (v4i8, i32); v4i8 __builtin_mips_shra_r_qb (v4i8, i32); v2i16 __builtin_mips_shrl_ph (v2i16, i32); v2q15 __builtin_mips_subqh_ph (v2q15, v2q15); v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15); q31 __builtin_mips_subqh_w (q31, q31); q31 __builtin_mips_subqh_r_w (q31, q31); v2i16 __builtin_mips_subu_ph (v2i16, v2i16); v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16); v4i8 __builtin_mips_subuh_qb (v4i8, v4i8); v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8); ``` ### 6.10.5 Intrinsics for Atomic R-M-W The bare-iron/ELF library includes a set of atomic read-modify-write operations which provide fast, protected access to shared memory locations (but not device registers) in the face of interrupts. In the case of processors which support the 11 and sc instructions, and have the appropriate external hardware, they will also be multi-processor safe. These facilities can be used to implement semaphores, mutexes, counters, etc. To use these functions include the header file <*mips/atomic.h>*. The functions are as follows: ``` uint32_t mips_atomic_bis(uint32_t *wp, uint32_t bits) ``` The atomic bit "test-and-set" operation: sets those bits in \*wp selected by non-zero bits in bits (e.g. \*wp $\mid$ = set), and returns the old value of \*wp. ``` uint32_t mips_atomic_bic(uint32_t *wp, uint32_t bits) ``` The atomic bit "test-and-clear" operation: clears those bits in \*wp selected by non-zero bits in bits (e.g. \*wp &= ~clr), and returns the old value of \*wp. ``` uint32_t mips_atomic_bcs(uint32_t *wp, uint32_t clr, uint32_t set) ``` A combined atomic bit "test-clear-and-set" operation: clears those bits in \*wp selected by non-zero bits in clr and sets those selected by set (e.g. \*wp = (\*wp & ~clr) | set). Returns the old value of \*wp. ``` uint32 t mips atomic swap(uint32 t *wp, uint32 t new) ``` The atomic "test-and-swap", sets \*wp to new, and returns the old value of \*wp. ``` uint32_t mips_atomic_inc(uint32_t *wp) ``` Atomically increments \*wp, returning its old value. ``` uint32_t mips_atomic_dec(uint32_t *wp) ``` Atomically decrements \*wp, returning its old value. ``` uint32 t mips atomic add(uint32 t *wp, uint32 t val) ``` Atomically adds val to \*wp, returning its old value. ``` uint32_t mips_atomic_cas(uint32_t *wp, uint32_t new, uint32_t cmp) ``` Atomic "compare-and-swap": sets \*wp to new, but only if it originally equals cmp. It returns the original value of \*wp, whether or not updated. Note that when the CPU does not include the 11 and sc instructions, the operation is simulated, and will only be atomic if all interrupts are handled by the standard bare-iron/ELF library exception handler, where there is special fixup code. #### 6.10.6 Intrinsics for Data Prefetch Some MIPS-Based PUs support the pref instruction, which allows a programmer to optimize array processing loops (as used in many DSP algorithms) by explicitly prefetching the next block of data into the data cache before it is needed, to minimize the cache-miss latency of the following loads and stores. If it is done early enough, the data will already be in the cache by the time it is needed. The bare-iron/ELF library includes a set of prefetch intrinsics to access these instructions. On CPUs which don't support the pref instruction these will be no-ops. To use the intrinsics include the header file *<mips/cpu.h>*. ``` void mips_prefetch (void *addr, int rw, int locality) ``` The value of addr is the address of the memory to prefetch. There are two further arguments: rw and locality. The value of rw is a compile-time constant one or zero; one means that the prefetch is preparing for a write to the memory address and zero means that the prefetch is preparing for a read. The value locality must be a compile-time constant integer between zero and three. A value of zero means that the data has no temporal locality, so it need not be left in the cache after the access. A value of three means that the data has a high degree of temporal locality and should be left in all levels of cache possible. Values of one and two mean, respectively, a low or moderate degree of temporal locality. For example: ``` j = mips_dcache_linesize / sizeof (a[0]); for (i = 0; i < n; i++) { a[i] = a[i] + b[i]; mips_prefetch (&a[i+j], 1, 1); mips_prefetch (&b[i+j], 0, 1); /* ... */ }</pre> ``` Data prefetch does not generate faults if addr is invalid, but the address expression itself must be valid. For example, a prefetch of p->next will not fault if p->next is not a valid address, but evaluation will fault if p is not a valid address. Note that the mips\_prefetch arguments match the \_builtin\_prefetch intrinsic in GCC 3.x, for which it is an alias. ``` void mips_nudge (void *addr) ``` The MIPS-specific "nudge" (push to memory) operation. The addressed cache line is written back to memory and invalidated. ``` void mips_prepare_for_store (void *addr) ``` The MIPS-specific "prepare for store" operation. If the addressed line is not already in the cache, then a line is allocated for it without reading memory (possibly flushing another line from the cache), and the line is cleared to zero. Warning: since this may zero the whole cache line, make sure that you only operate on cache line sized chunks, with cache line alignment. # **CPU Management** The second major component of the bare-iron/ELF library run-time system consists of a set of support functions with which to initialize and maintain a MIPS architecture processor's caches, TLB and coprocessor registers; together with a powerful exception and interrupt handling mechanism, and support for remote source debugging of rommable code. ### 7.1 Cache Maintenance The cache management function prototypes are supplied by including *<mips/cpu.h>*. Many of these routines expect to be passed an address range to operate on, consisting of a starting *virtual address*, and a byte count. ``` void mips_size_cache (void) ``` Size the caches, setting the following global variables: - *mips\_icache\_size*, *mips\_icache\_linesize*, *mips\_icache\_ways*: The size (in bytes) of the primary instruction cache; the size of each cache line, and the number of ways of set associativity. - mips\_dcache\_size, mips\_dcache\_linesize, mips\_dcache\_ways: Ditto for the primary data cache. ``` void mips init cache (void) ``` Size the caches as above, and initialize them. The function MUST be called after a hardware reset and before using the caches, otherwise they may be in an inconsistent state. This is normally called by the standard reset code. Do NOT call it from application code, as it may invalidate dirty cache lines in a writeback cache, without actually writing them back to memory. ``` void mips_sync_icache (vaddr_t va, size_t n) ``` Synchronizes the I-cache with the D-cache, which is necessary when the instruction stream is modified by software (e.g. inserting software breakpoints, self-modifying code, etc). ``` void mips_clean_cache (vaddr_t va, size_t n) ``` Write back and invalidate entries matching the given address range from all caches. The most common routine to call in device drivers before starting a DMA transfer, or after dynamically modifying executable code. ``` void mips_clean_dcache (vaddr_t va, size_t n) ``` Write back and invalidate entries matching the given address range from the data caches only - separate instruction caches are unchanged. ``` void mips_clean_icache (vaddr_t va, size_t n) ``` Invalidate entries matching the given address range from the instruction caches only - separate data caches are unchanged. ``` void mips flush cache (void) ``` Write back and invalidate all entries from all caches. The simplest way to completely synchronize caches and memory, but not necessarily the most efficient. ``` void mips flush dcache (void) ``` Write back and invalidate all entries from all data caches - separate instruction caches are unchanged. ``` void mips_flush_icache (void) ``` Invalidate all entries from all instruction caches - separate data caches are unchanged. ``` void mips_lock_icache (vaddr_t va, size_t n) void mips_lock_dcache (vaddr_t va, size_t n) void mips_lock_scache (vaddr_t va, size_t n) ``` On CPUs which support cache locking, these functions allow you to lock regions of code or data into the primary instruction, data or secondary caches respectively. Take care not to use the global *flush* functions after locking caches, as they will invalidate (and unlock) the locked cache lines. ## 7.2 TLB Maintenance The functions listed below provide for initialization and maintenance of the CPU's memory management Translation Lookaside Buffer (TLB), if present. The TLB and memory management definitions are supplied by including *mips/cpu.h>*. ``` void mips_init_tlb (void) ``` Initializes and invalidates the whole TLB. ``` unsigned int mips_tlb_size (void) ``` Returns the number of entries in the TLB. ``` void mips_tlbinval (tlbhi_t hi) ``` Probes the TLB for an entry matching hi, and if present invalidates it. ``` void mips_tlbinvalall (void) ``` Invalidate the entire TLB. ``` void mips_tlbri2 (tlbhi_t *phi, tlblo_t *plo0, tlblo_t *plo1, unsigned *pmsk, int index) ``` Reads the TLB entry with specified by index, and returns the *EntryHi*, *EntryLo0*, *EntryLo1*, and *PageMask* parts in \*phi, \*plo0, \*plo1 and \*pmsk respectively. ``` void mips_tlbwi2 (tlbhi_t hi, tlblo_t lo0, tlblo_t lo1, unsigned msk, int index) ``` Writes hi, 100, 101 and msk into the TLB entry specified by index. ``` void mips tlbwr2 (tlbhi t hi, tlblo t lo0, tlblo t lo1, unsigned msk) ``` Writes hi, 100, 101 and msk into the TLB entry specified by the *Random* Register. ``` int mips_tlbprobe2 (tlbhi_t hi, tlblo_t *plo0, tlblo_t *plo1, unsigned *pmsk) ``` Probes the TLB for an entry matching hi and returns its index, or -1 if not found. If found, then the *EntryLo0*, *EntryLo1* and *PageMask* parts of the entry are also returned in \*plo1, \*plo1 and \*pmsk respectively. ``` int mips_tlbrwr2 (tlbhi_t hi, tlblo_t lo0, tlblo_t lo1, unsigned msk) ``` Probes the TLB for an entry matching hi and if present rewrites that entry, otherwise updates a random entry. A safe way to update the TLB. # 7.3 Hardware Watchpoints Some MIPS architecture CPUs provide one or more hardware watchpoint registers in Coprocessor 0 (these are separate from any EJTAG hardware breakpoint registers). The watchpoint registers generate a CPU exception when software loads or stores data, or executes instructions, within a programmable address range. Different MIPS-Based CPUs implement very different watchpoint controls (number of watchpoints, type of access, physical/virtual address, address masking, and so on). To make this manageable and portable between different CPUS we have developed a generic API which is documented here. These facilities are used by the bare-iron/ELF library remote debug stub to support *gdb*'s watchpoint facility; but you could also use them to implement profiling or debugging facilities within your own software. To use the watchpoint API described here, include the file *<mips/watchpoint.h>*. ``` int _mips_watchpoint_init (void) ``` Initializes the watchpoint system and returns the number of hardware watchpoints available. ``` int _mips_watchpoint_howmany (void) ``` Just returns the number of hardware watchpoints, without re-initializing the sub-system. ``` int mips watchpoint capabilities (int wpnum) ``` Returns the *capability* of watchpoint number *wpnum* (0 to n). Usually called after \_mips\_watchpoint\_init() to collect and cache each watchpoint's capability. The capability is the bitwise OR of some or all of the values shown in Table 7.1. **Table 7.1 Hardware Watchpoint Attributes** | Watchpoint | Attribute | |-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------| | MIPS_WATCHPOINT_SSTEP | Hardware single-step supported. | | MIPS_WATCHPOINT_VALUE | Can qualify the watchpoint with the value of the data being read or written from/to memory. | | MIPS_WATCHPOINT_ASID | Can qualify match using the virtual address-space ID (ASID). | | MIPS_WATCHPOINT_VADDR | Matches against virtual address (if not set then matches against physical address). | | MIPS_WATCHPOINT_RANGE | Supports an address range (arbitrarily aligned start and end address). | | MIPS_WATCHPOINT_MASK | Supports an address mask (size must be a power-of-two, and start address aligned on a matching boundary). | | MIPS_WATCHPOINT_DWORD | Only supports an address match within a single 8 byte aligned double word; if an address range/mask is supported then the minimum size and alignment is 8 bytes. | | MIPS_WATCHPOINT_WORD | Only supports an address match within a single 4 byte aligned word; if an address range/mask is supported then the minimum size and alignment is 4 bytes. | | MIPS_WATCHPOINT_X | Instruction fetch breakpoint supported. | | MIPS_WATCHPOINT_R | Data read breakpoint supported. | | MIPS_WATCHPOINT_W | Data write breakpoint supported. | int \_mips\_watchpoint\_set (int type, int asid, vaddr\_t va, paddr\_t pa, size\_t size) Creates a new watchpoint where: *type* is the OR of the last three capabilities (i.e. instruction fetch, read and/or write); *asid* is the virtual address space ID (or -1 for global); *va* is the virtual address of the start of the watchpoint region; *pa* is the physical address (can be zero if virtual address matching is supported); and *size* is the size of the watchpoint region. For CPUs which support an address mask, *addr* and *size* can be arbitrarily aligned, and the code will compute the smallest aligned region which fits around them. Beware that this could get quite loose, and cause a large number of false watchpoint hits. The return values, shown in Table 7.2, indicate the success or failure. **Table 7.2 Watchpoint Return Codes** | Watchpoint | Return Code | |----------------|-----------------------------------------------------------------------------------------------------------------------------------| | MIPS_WP_OK | Succeeded. | | MIPS_WP_NOTSUP | This type of watchpoint is not supported, or possibly you've asked for a watchpoint region which is larger than can be supported. | | MIPS_WP_INUSE | All hardware resources which support this type of watchpoint are in use. | **Table 7.2 Watchpoint Return Codes (Continued)** | Watchpoint | Return Code | |-----------------|-------------------------------------------------------------------------| | MIPS_WP_NOMATCH | Matching watchpoint cannot be found (seemips_watchpoint_clear() below). | | MIPS_WP_OVERLAP | Address range would overlap the debugger's own code, data or stack. | | MIPS_WP_BADADDR | If the pa value is zero and virtual address matching is not supported. | ``` int _mips_watchpoint_clear (int type, int asid, vaddr_t va, size_t size) ``` Delete a watchpoint: the parameters must match those used when the watchpoint was created by \_mips\_watchpoint\_set(). See \_mips\_watchpoint\_set() for the return codes. ``` int _mips_watchpoint_set_callback (int asid, vaddr_t va, size_t len) ``` A callback function which you can (optionally) provide. When a new watchpoint is about to be added, your code has a last chance to check the computed address range to make sure that it doesn't overlap with its own code or data (which could cause recursive watchpoint traps). Should return MIPS\_WP\_OK or MIPS\_WP\_OVERLAP. If you don't provide this function then all watchpoints are allowed. ``` int _mips_watchpoint_hit (const struct xcptcontext *xcp, vaddr_t *vap, size_t *sizep) ``` Called by your hardware watchpoint exception handler (usually the debug stub) to check whether the exception context *xcp* was a true watchpoint hit. If so the return value will be non-zero, and contain one of MIPS\_WATCHPOINT\_R, MIPS\_WATCHPOINT\_W or MIPS\_WATCHPOINT\_X to indicate the type of access. If in addition the bit MIPS\_WATCHPOINT\_INEXACT is set then this was a watchpoint exception, but it was based on a loose address mask, and this access was outside of the range originally requested by \_\_mips\_watchpoint\_set(); your code must single-step over this instruction and then continue. ``` void _mips_watchpoint_remove (void) ``` Called by the debug stub, or your watchpoint exception handler, to disable hardware watchpoints, e.g. before single-stepping over an instruction which may trigger the watchpoint. ``` void _mips_watchpoint_insert (void) ``` Called by the debug stub, or watchpoint exception handler, to enable hardware watchpoints, e.g. after single-stepping over an instruction and before continuing execution. ``` void mips watchpoint reset (void) ``` Clear all watchpoints. # 7.4 System Coprocessor (CP0) Intrinsics All MIPS-Based CPUs contain a "System Control" subsystem known as Coprocessor 0, or CP0. This is used by operating systems and other low-level software to control interrupts, exceptions, memory management, caches, etc. These intrinsics provide very low-level access to the CP0 registers from C and C++ code. Other intrinsics which give access to "user-level" instructions and registers are described in a separate chapter, see Chapter 6, "MIPS® Architecture Intrinsics" on page 61. The header file *<mips/cpu.h>* (which in turn includes the appropriate cpu-specific header), defines the intrinsics shown in Table 7.3 and described in the following subsections. The "\*" symbol represents up to five separate intrinsics. | * | Arguments | Operation | |-----|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------| | get | () | Return the register value. | | set | (unsigned val) | Sets the register to val, and returns void. | | xch | (unsigned val) | Sets the register to val, and returns the <b>old</b> register value. | | bis | (unsigned set) | Bit set $(reg \parallel = set)$ : returns the <b>old</b> register value. Only defined for registers with bit-fields. | | bic | (unsigned clr) | Bit clear (reg &= ~clr): returns the <b>old</b> register value. Only defined for registers with bit-fields. | | bcs | (unsigned clr, unsigned set) | Bit clear and set (reg = $(reg \& \sim clr) \mid set$ ): returns the <b>old</b> register value. Only defined for registers with bit-fields. | **Table 7.3 Register Access Intrinsics** # 7.4.1 Common CP0 Registers Some of the CP0 registers are common between almost all MIPS-Based CPU families, and the intrinsics to access these have the common prefix mips\_. Remember though that even for the common registers, the internal bit definitions are not necessarily the same across all CPU types. Make sure that you include the generic *<mips/cpu.h>*, and not *<mips/m32c0.h>*, or any of the CPU-specific header files. **N.B.** The intrinsics which manipulate the Coprocessor registers do not provide atomicity in the presence of interrupts or other exceptions. This can be particularly important if you are changing the *Cause* or *Status* registers. If possible, avoid read-modify-write operations on the *Status* Register: write only constant values, or stored values manipulated only by atomic operations, unless you know that interrupts are already disabled (e.g. because you're in an exception handler). Ensure that interrupts are disabled when you update the *Cause* Register. ``` mips_*sr ``` (i.e. *mips\_getsr*, *mips\_setsr*, *mips\_setsr*, *mips\_bissr*, *mips\_bicsr*). Operations on the *Status* Register (CP0 register 12). See the atomicity warning above. ``` mips_*cr ``` Operations on the *Cause* Register (CP0 register 13). See warning above. ``` mips_getcount, mips_setcount mips_getcompare, mips_setcompare ``` Operations on the *Count* and *Compare* Registers (CP0 registers 9 and 11). Available on most modern MIPS architecture CPUs, these implement an on-chip timer. ``` mips_getprid Return the read-only PrID Register (CP0 register 15). See <mips/prid.h> for a list of known values. mips *config Operations on Config Register (CP0 register number varies). mips *ecc Operations on ECC Register (CP0 register 26), used for cache error correction on some MIPS III + CPUs. mips_*context Operations on the Context Register (CP0 register 4). mips_*pagemask Operations on the PageMask Register (CP0 register 5). mips_*wired Operations on the Wired Register (CP0 register 6). mips_*entrylo Operations on the EntryLo Register (CP0 register 2). mips_*entryhi Operations on the EntryHi Register (CP0 register 10). mips_*taglo mips_*taghi Operations on TagLo and TagHi registers (CP0 registers 28 and 29), used for cache testing and maintenance on many MIPS architecture CPUs. mips_*watchlo mips *watchhi Operations on WatchLo and WatchHi registers (CP0 registers 18 and 19), used for hardware watchpoints on many MIPS III + CPUs. ``` # 7.4.2 CP0 Registers of MIPS32®/MIPS64® Architecture The include files <*mips/m32c0.h>* and <*mips/m32tlb.h>* define the Coprocessor registers and memory-management unit of CPUs conforming to the MIPS32/MIPS64 specifications. They include the following functions: ``` mips32 *config0 ``` Operations on the *Config0* Register (CP0 register 16, select 0), also available via the generic mips\_\*config functions described above. ``` mips32_getconfig1 Returns the Config1 Register (CP0 register 16, select 1). mips32_getconfig2 Returns the Config2 Register (CP0 register 16, select 2). mips32_getconfig3 Returns the Config3 Register (CP0 register 16, select 3). mips32_getwatchlo(int sel) Return the WatchLo Register numbered sel. mips32_setwatchlo(int sel, unsigned int val) Set the WatchLo Register numbered sel to val. mips32_getwatchhi(int sel) Return the WatchHi Register numbered sel. mips32_setwatchhi(int sel, unsigned int val) Set the WatchHi Register numbered sel to val. mips32_*errctl Operations on the ErrCtl Register (CP0 register 26, select 0). mips32_*datalo ``` # 7.4.3 CP0 Registers of MIPS32®/MIPS64® Release 2 Architecture Operations on the *DataLo* Register (CP0 register 28, select 1). The MIPS32 Release 2 ISA defines a few new Coprocessor 0 registers, also defined in include files <mips/m32c0.h>. ``` mips32_*pagegrain Operations on the MIPS32 Release 2 PageGrain Register (CP0 register 5, select 1). mips32_*hwrena Operations on the MIPS32 Release 2 HWREna Register (CP0 register 7, select 0). mips32_*intct1 ``` ``` Operations on the MIPS32 Release 2 IntCtl Register (CP0 register 12, select 1). mips32_*srsct1 Operations on the MIPS32 Release 2 SRSCtl Register (CP0 register 12, select 2). mips32_*srsmap Operations on the MIPS32 Release 2 SRSMap Register (CP0 register 12, select 3). mips32_*ebase ``` Operations on the MIPS32 Release 2 *EBase* Register (CP0 register 15, select 1). ### 7.4.4 Shadow Sets of MIPS32®/MIPS64® Release 2 Architecture The MIPS32 Release 2 architecture adds support for alternative "shadow" banks of CPU general purpose registers, for use by low-latency interrupt and exception handlers. These intrinsics allow C code to read and write registers in other shadow sets, and are defined in include files *mips/m32c0.h>*. ``` uint32_t _mips32r2_xchsrspss(uint32_t set) ``` Sets the *PSS* field in the *SRSCtl* Register to set, allowing access to that shadow set with the following intrinsics. Returns the old value of the *PSS* field. ``` uint32_t _mips32r2_rdpgpr(int regno) ``` Returns register number regno from the selected shadow set. The regno argument must be a constant between 0 and 31. ``` void _mips32r2_wrpgpr(int regno, uint32_t val) ``` Sets register number regno in the selected shadow set to val. The regno argument must be a constant between 0 and 31. # 7.4.5 CP0 Registers of MIPS® MT ASE The include file *<mips/mt.h>* defines the Coprocessor registers introduced by the MT ASE, and includes the following C access functions: ``` mips32_*mvpcontrol Operations on the MVPControl Register (CP0 Register 0, Select 1). mips32_*mvpconf0 Operations on the MVPConf0 Register (CP0 Register 0, Select 2). mips32_*mvpconf1 Operations on the MVPConf1 Register (CP0 Register 0, Select 3). ``` ``` mips32_*vpecontrol Operations on the VPEControl Register (CP0 Register 1, Select 1). mips32 *vpeconf0 Operations on the VPEConf0 Register (CP0 Register 1, Select 2). mips32 *vpeconf1 Operations on the VPEConf1 Register (CP0 Register 1, Select 3). mips32_*yqmask Operations on the YQMask Register (CP0 Register 1, Select 4). mips32_*vpeschedule Operations on the VPESchedule Register (CP0 Register 1, Select 5). mips32_*vpeschefback Operations on the VPEScheFback Register (CP0 Register 1, Select 7). mips32_*tcstatus Operations on the TCStatus Register (CP0 Register 4, Select 1). mips32_*tcpc Operations on the TCPC Register (CP0 Register 4, Select 2). mips32_*tchalt Operations on the TCHalt Register (CP0 Register 4, Select 3). mips32_*tccontext Operations on the TCContext Register (CP0 Register 4, Select 4). mips32_*tcschedule Operations on the TCSchedule Register (CP0 Register 4, Select 5). mips32_*tcschefback Operations on the TCScheFback Register (CP0 Register 4, Select 6). mips32 *srsconf* Operations on the SRSConf0-4 Registers (CP0 Register 6, Select 1-5) The MT ASE also permits access to registers with a different thread context or virtual processor. ``` ``` mips32_mt_settarget (int vpe, int tc) Selects the target VPE and TC number for the following access functions. mips32_mt_getc0status() Return the CP0 Status Register of the selected TC/VPE. mips32_mt_setc0status(int val) Set the CPO Status Register of the selected TC/VPE. mips32_mt_getc0cause() Return the CP0 Cause Register of the selected TC/VPE. mips32_mt_setc0cause(val) Set the CP0 Cause Register of the selected TC/VPE. mips32_mt_getc0config() Return the CP0 Config Register of the selected TC/VPE. mips32_mt_setc0config(val) Set the CP0 Config Register of the selected TC/VPE. mips32_mt_getc0config1() Return the CP0 Config1 Register of the selected TC/VPE. mips32_mt_setc0config1(val) Set the CP0 Config1 Register of the selected TC/VPE. mips32_mt_getc0ebase() Return the CP0 EBase Register of the selected TC/VPE. mips32_mt_setc0ebase(val) Set the CP0 EBase Register of the selected TC/VPE. mips32_mt_getsp() Return the stack pointer ($29) of the selected TC/VPE. mips32_mt_setsp(val) Set the stack pointer ($29) of the selected TC/VPE. mips32_mt_getgp() ``` ``` Return the global pointer ($28) of the selected TC/VPE. mips32_mt_setgp(val) Set the global pointer ($28) of the selected TC/VPE. mips32_mt_getvpecontrol() Return the CP0 VPEControl Register of the selected TC/VPE. mips32_mt_setvpecontrol(val) Set the CP0 VPEControl Register of the selected TC/VPE. mips32_mt_getvpeconf0() Return the CP0 VPEConf0 Register of the selected TC/VPE. mips32_mt_setvpeconf0(val) Set the CP0 VPEConf0 Register of the selected TC/VPE. mips32_mt_gettcstatus() Return the CP0 TCStatus Register of the selected TC/VPE. mips32_mt_settcstatus(val) Set the CP0 TCStatus Register of the selected TC/VPE. mips32_mt_gettcbind() Return the CP0 TCBind Register of the selected TC/VPE. mips32_mt_settcbind(val) Set the CP0 TCBind Register of the selected TC/VPE. mips32_mt_gettcrestart() Return the CP0 TCRestart Register of the selected TC/VPE. mips32_mt_settcrestart(val) Set the CP0 TCRestart Register of the selected TC/VPE. mips32_mt_settchalt(val) Set the CP0 TCHalt Register of the selected TC/VPE. mips32_mt_gettccontext() Return the CP0 TCContext Register of the selected TC/VPE. ``` ``` mips32_mt_settccontext(val) ``` Set the CP0 TCContext Register of the selected TC/VPE. # 7.5 Miscellaneous System Support The following generic MIPS system support functions are defined in include file <mips/cpu.h>. ``` void mips_wbflush (void) ``` Drain the write buffer. All stores issued prior to the call are guaranteed to have been written to memory or device by the time the function returns. It should be called between writing to device control registers and reading their status/data registers. On some CPUs it is also necessary to call it between successive writes to the same register, to prevent word-gathering write-buffers from swallowing some of the writes. ``` void mips sync (void) ``` On modern MIPS-Based CPUs this generates a sync instruction. This is almost but not quite the same as mips\_wbflush() - it is a memory *barrier* which guarantees that all memory accesses preceding this instruction will be completed before any accesses which follow this instruction. It says nothing though about external state, such as interrupts - and on simpler CPUs with blocking loads it may be interpreted as a no-op. ``` uint8_t mips_get_byte (void *addr, int *err) uint16_t mips_get_half (void *addr, int *err) uint32_t mips_get_word (void *addr, int *err) uint64_t mips_get_dword (void *addr, int *err) ``` Return the byte, halfword, word, or dword at address addr. If the address is invalid, then \*err may be set to a non-zero value; otherwise \*err is unchanged. You can use these functions when accessing arbitrary memory locations outside of your program, to ensure that peculiarities of your system or CPU address map are handled correctly. ``` int mips_put_byte (void *addr, uint8_t val) int mips_put_half (void *addr, uint16_t val) int mips_put_word (void *addr, uint32_t val) int mips_put_dword (void *addr, uint64_t val) ``` Store a byte, halfword, word, or dword val to arbitrary address addr. If the address is invalid, then a non-zero value may be returned, otherwise they return zero. # 7.6 Floating Point Coprocessor (CP1) The generic header file *<mips/fpa.h>* defines constants and functions for controlling the floating point coprocessor (CP1) and its register set. ``` int fpa_enable (int fast) ``` Probes to see if CP1 is present. If so it is initialized, CP1 instructions are enabled, and 1 is returned. If it is not present, then CP1 instructions are disabled, and 0 is returned. If fast is non-zero then, if possible, the FPU is set to "performance mode" where IEEE-754 traps will not be taken for denormalized values, which will instead be flushed or rounded. ``` void fpa_save (struct fpactx *ctx) ``` Save all the floating point data registers and coprocessor state into the structure pointed to by ctx. ``` void fpa_restore (const struct fpactx *ctx) ``` Restore all the registers and coprocessor state from the structure pointed to by ctx. ``` unsigned fpa getrid (void) ``` Returns on CP1 control register 0, the read-only floating point *RevisionID* Register. ``` fpa *sr ``` Operations on CP1 control register 31, the floating point control and status register. See Section 7.4 "System Coprocessor (CP0) Intrinsics" for a description of '\*'. ## 7.6.1 Coprocessor 1 Emulation The run-time system includes a complete MIPS coprocessor 1 (floating point) instruction emulator. It can emulate all floating point instructions when there is no hardware FPU, or just those instructions with operands that the FPU cannot handle (e.g. denormalized values, underflow, etc). This library module is called libcs3-mips-cp1.a. The only public interface to the module is: ``` void_cop1_init (int emulateall); ``` This function installs the appropriate exception or interrupt handler: a non-zero value for emulateall installs full emulation via the CoProcessor Unusable (XCPTCPU) exception, whilst a zero value installs only the floating point interrupt handler (or XCPTFPE exception handler on an R4000 CPU and above). You'll probably never need to call it yourself - it is normally invoked automatically by the standard run-time startup code. A faster alternative to trap-based coprocessor emulation is to use the compiler's **-msoft-float** option, # 7.7 IEEE-754 Floating Point Emulation Library The aforementioned coprocessor 1 emulator (libcs3-mips-cp1.a) calls another library which emulates each floating-point operation. This floating-point instruction emulation library for the bare-iron/ELF environment is named libcs3-mips-fpemu.a, and specified to the linker as -lcs3-mips-fpemu. The libcs3-mips-fpemu.a library implements single- and double-precision IEEE-754 floating point, but using only integer instructions. The two libraries are invoked from a trap-based FPU instruction emulator (to fix up exceptional conditions, or when your code was built for a hardware FPU which is absent). The emulator is implemented in a library named lcs3-mips-cp1.a and is specified to the linker as -lcs3-mips-cp1. To use the trap-based FPU emulator, this set of compiler switches must be used: The library provides this functionality: • A pedantic emulation of the MIPS floating point unit, which is used to implement the trap-based FPU hardware emulation. This uses function names like ieee754dp\_add. You'll find a primer on floating point and its implementation in the MIPS architecture in [Sweet99]. | 7.7 I | <b>EEE-754</b> | Floating | <b>Point Emulation</b> | Library | |-------|----------------|----------|------------------------|---------| |-------|----------------|----------|------------------------|---------| # References #### [Sweet99] See MIPS Run, Dominic Sweetman (of MIPS Technologies), 1999, Morgan Kaufman, ISBN 1-55860-410-3. We have to give special mention to this comprehensive guide to the MIPS Architecture and programming; firstly because one of us wrote it, and secondly because if you read it carefully enough we'll save time on support work. #### [Farq94] *The MIPS Programmers Handbook*, Erin Farquhar & Philip Bunce, 1994, Morgan Kaufmann, ISBN 1-55860-297-6. Example-based programming book aimed at small MIPS-based systems. #### [SGI96] MIPSpro™ Assembly Language Programmer's Guide, Silicon Graphics Inc. #### [Kane92] MIPS RISC Architecture, Gerry Kane and Joe Heinrich, 1992, Prentice Hall, ISBN 0-13-584210-7. Reference manual to MIPS instructions, focussed on the machine instruction level. #### [Kern88] *The C Programming Language* (Second Edition), Brian W. Kernighan and Dennis M. Ritchie, 1988, Prentice Hall, ISBN 0-13-110362-8. Throw away all those cheerfully coloured fat books with big letters and lots of pictures. If you want to program in C you need this and nothing else. #### [Lewine91] POSIX Programmer's Guide, Donald Lewine, 1991, O'Reilly, ISBN 0-937175-73-0. An introduction to and complete set of manual pages for the POSIX.1 programming interface. Then there are reference works; we need to put these in, but you won't read them unless you have to: #### [POSIX88] IEEE Standard 1003.1-1988, Institute of Electrical and Electronics Engineers Inc., 1985. ### [ABI] System V Applications Binary Interface - Revised Edition, Unix System Laboratories, Prentice Hall, ISBN 0-13-877598-2. #### [MIPSABI] System V ABI MIPS Processor Supplement, Unix System Laboratories, Prentice Hall, ISBN 0-13-880170-3. #### [ELF] *Understanding ELF Object Files and Debugging Tools*, Mary Lou Nohr (Editor), Prentice Hall, ISBN 0-13-091109-7. #### [MD00410] MIPS® SDE for Linux Getting Started Guide, MIPS Technologies, Inc. The document which describes the SDE toolchain configured for native development Linux/MIPS kernels and applications. #### [MD00374] MIPS32® Architecture for Programmers Volume IV-e: MIPS® DSP Application-Specific Extension to the MIPS32® Architecture, MIPS Technologies, Inc. #### [MD00378] MIPS32® Architecture for Programmers Volume IV-f: MIPS® MT Application-Specific Extension to the MIPS32® Architecture, MIPS Technologies, Inc. You can't (so far as we know) buy the following GNU manuals, but they're provided as part of the toolchain: #### [Binutils] All the object-code tools except the linker itself, which gets a separate manual [Ld]. ### [Conv] The SDE-specific ELF file conversion tool (sde-conv). #### [Cpp] The GNU C pre-processor; only for specialists. ### [Gcc] The compiler manual. Serious users should think about reading this through one time. ### [Gdb] The debugger manual. Probably for reference only. #### [Gprof] The profiler manual. Read this if you're planning to do performance analysis. #### [Ld] The linker manual. #### [Make] Read this if you're keen to create makefiles even more exciting than those in the examples. # **Revision History** Change bars (vertical lines) in the margins of this document indicate significant changes in the document since its last release. Change bars are removed for changes that are more than one revision old. This document may refer to Architecture specifications (for example, instruction set descriptions and EJTAG register definitions), and change bars in these sections indicate changes since the previous version of the relevant Architecture document. | Revision | Date | Description | |----------|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------| | 1.00 | April 8, 2008 | First external version | | 1.01 | April 9, 2008 | changed/SDE notation to \$SDETOP | | 1.02 | April 28, 2008 | • predefined macros - some have instead of | | | | added porting chapter | | 1.03 | May 27, 2008 | More accurate description on how to set up<br>MDI target. | | 1.04 | May 27, 2008 | Compiler chapter Typo in mtune table -<br>missing 34kf1_1 | | 1.05 | June 20, 2008 | • mtune table - 24k and 24ke replaced by 24kc and 24kec | | 1.06 | July 25, 2008 | <ul><li>mtune table - added 74f3_2, 4ksd, 4ksc</li><li>remove mention of bestgp</li></ul> | | 1.07 | November 18, 2008 | Make appropriate for NewLib/CS3 library. | | 1.50 | April 06, 2009 | <ul> <li>Moved MIPS intrinsics from SDE Library<br/>Document.</li> <li>Updated FP Software Emulation Library<br/>names.</li> </ul> | | 1.51 | April 07, 2009 | Minor Typos fixed. | | 1.52 | April 08, 2009 | <ul> <li>Be more consistent in mentioning that only<br/>MIPS32R2 library object files are delivered.</li> <li>More Minor Typos</li> </ul> | | 1.53 | June 01, 2009 | <ul><li>Linker script name change for MIPSSIM</li><li>Typo in FP emulator library name.</li></ul> |