Actions

icon Post
text/html Subscribe
text/html Unsubscribe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Revised versions of MIPS TLS ABI specification


  • To: mips-tls@xxxxxxxxxxxxxxxx
  • Subject: Revised versions of MIPS TLS ABI specification
  • From: Mark Mitchell <mark@xxxxxxxxxxxxxxxx>
  • Date: Thu, 28 Oct 2004 12:47:52 -0700

I've attached a revised version of the specification.

As you will see from the revision history at the top, the only changes I've made are the incorrect uses of "$(3)" found by Nigel.

I've also added an Open Issues section, where I've attempted to record the issues raised thus far. My priority (well, MontaVista's priority) is to get the basics nailed down so that we can start implementing them ASAP. That means that I would prefer to get consensus about the basics of the specification, even if there are some bits that we want to add in the future.

I would like to get the GD/LD bits specified first, so we can start implementing; then we can come back and LE/IE. Thus, I think the following issues are all things that we can decide later, given that at the moment we're talking only about General Dynamic and Local Dynamic.
1. Add |-mxgot| support.

We want to do this, for sure, but it does not affect General Dynamic, and it is clear that this can be added to Local Dynamic without impacting the currently specified functionality.

2. The code sequences given for the Local Dynamic Model limit the size of the TLS data area to 2GB.

If we decide we want to do this, we can again do it in a way that is forward compatible with the current specification.

3. Should we use Model I or Model II from Drepper's paper?

This issue does not arise for Global/Local Dynamic.

4. Does __tls_get_addr need a special calling convention?

This issue does not matter until we start doing linker optimizations. Until that time, the compiler can be conservative, and assume this is an ordinary call.

The following issue does need resolution:

1. Should we extend the 32K Local Dynamic Model to 64K by using a biased offset?

Daniel and I thought this would be more trouble than it's worth; Thiemo thinks otherwise. Are there any other opinions?

Once this issue is resolved, what is the next step? Would people like us to publish the draft on our web site? Or should it go on mips-linux.com? We intend to start doing patches to implement this stuff, but it will be more acceptable to the GCC/Binutils/GLIBC maintainers if there is a specification that we can reference.

--
Mark Mitchell
CodeSourcery, LLC
(916) 791-8304
mark@xxxxxxxxxxxxxxxx

    Revision History

2004-10-28

    * Removed incorrect |($3)| from Code Sequence I in the Local Dynamic
      Model.
    * Added missing |($3)| to Code Sequene II in the Local Dynamic Model.


    Open Issues

    * Add |-mxgot| support.

      The specification as written does not currently contain sequences
      for compilation with |-mxgot|.

    * The code sequences given for the Local Dynamic Model limit the
      size of the TLS data area to 2GB.

      Should we define an additional code sequence to permit a larger
      TLS data area?

    * Should |__tls_get_addr| have a special calling convention?

      Unless we make __tls_get_addr a non-standard function is known not
      to clobber $gp, then the linker optimizations are going to have to
      cope with assembler-generated call sequences in some cases, where
      the jalr and its delay slot are followed immediately by a gp
      restore. This occurs in O32 when compiler-generated
      "explicit-relocs" are not enabled, and in gcc-3.4 explicit-relocs
      are disabled when unit-at-a-time optimization is not enabled.

    * Should we use Model I or Model II for the TCB, as defined in
      Drepper's paper?

      Thiemo Seufer suggests that Model I is superior. Daniel Jacobowitz
      believes that they are approximately equivalent.

    * Should we extend the 32K Local Dynamic Model to 64K by using a
      biased offset?

      Thiemo Seufer suggests that we can use a bias of 0x7ff0. Daniel
      Jacobowitz and Mark Mitchell feel that this is more trouble than
      its worth.


    Closed Issues

    * Do the relocation numbers used in this specification conflict with
      those used by any existing MIPS toolchains?

      Nigel Stephens says that we should proceed with the numbers in the
      specification, but changes might be necessary if tools vendors
      complain.


    Overview

This document presents a design for implementing Thread Local Storage
(TLS) for MIPS Linux, in both 32-bit and 64-bit mode. This design
specifies the code that must be generated by the compiler, the
relocations that must be generated by the assembler, and the processing
that must be performed by the linker.


    Design Choices

   1. There are no available hardware registers to designate as the
      thread register. Therefore, kernel magic will be used to make the
      thread pointer available to userspace. This specification does not
      proscribe a mechanism for that; the mechanism for obtaining the
      thread pointer will be encapsulated in the |__tls_get_addr| function.
   2. Use TLS Variant II (in which the TLS data areas precede the TCB in
      memory).

      As noted in Drepper's paper, this design permits the compiler to

      generate efficient code for the case that the main executable
      accesses TLS variables from the executable itself.

   3. The |__tls_get_addr| function has the prototype:

       extern void *__tls_get_addr (tls_index *ti);

     where the type 'tls_index' is defined as::

       typedef struct {
         unsigned long ti_module;
         unsigned long ti_offset;
       } tls_index;

     The type 'unsigned long' is used because it is a 32-bit type in
     32-bit mode and a 64-bit type in 64-bit mode; thus, the members
     will fit correctly into two consecutive GOT entries in both
     modes.

   4. The Initial Exec and Local Exec models are not yet specified.

      These models require that the compiler be able to directly access
      the thread pointer without using |__tls_get_addr|. Whether or not
      that is possible will depend on the kernel mechanism used to
      implement the thread pointer.

   5. The compiler is not allowed to schedule the sequences below.

      The sequences below must appear exactly as written in the code
      generated by the compiler. This restriction is present because we
      have not yet specified the Initial Exec and Local Exec models, and
      so it is not clear what linker optimizations may be possible. In
      order to facilitate adding linker optimizations in the future,
      without recompiling current code, the compiler is restricted from
      scheduling these sequences.


    Conventions

In what follows, all references to registers other than |$2| (when it is
used as the return register) , |$4| (when it used as an argument
register), |$25| (the address of a called function), and |$28| (the
global pointer) are arbitrary; the compiler is free to use other
registers instead.

Where |...| appears in a code sequence the compiler may insert zero or
more arbitrary instructions.


    General Dynamic TLS Model

Code sequence (32-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 addiu $4, $28, %tlsgd(x)               R_MIPS_TLS_GD   x

Outstanding relocations (32-bit mode):

    GOT[n]                                      R_MIPS_TLS_DTPMOD32 x
    GOT[n+1]                                    R_MIPS_TLS_DPTOFF32 x

Code sequence (64-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 daddiu $4, $28, %tlsgd(x)              R_MIPS_TLS_GD   x

Outstanding relocations (64-bit mode):

    GOT[n]                                      R_MIPS_TLS_DTPMOD64 x
    GOT[n+1]                                    R_MIPS_TLS_DPTOFF64 x

At the end of the code sequence the address of |x| is available in |$2|.


    Local Dynamic TLS Model

Code sequence I (32-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM   x
         ...
    0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
    0x14 addiu $3, $3, %lo(%dtpoff(x1))         R_MIPS_TLS_LDO_LO16 x1
    0x18 addu $3, $3, $2
         ...
    0x1c lui $3, %hi(%dtpoff(x2))               R_MIPS_TLS_LDO_HI16 x2
    0x20 addiu $3, $3, %lo(%dtpoff(x2))         R_MIPS_TLS_LDO_LO16 x2
    0x24 addu $3, $3, $2

Outstanding relocations (32-bit mode):

    GOT[n]                                      R_MIPS_TLS_DTPMOD32 x1

Code sequence I (64-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM   x
         ...
    0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
    0x14 addiu $3, $3, %lo(%dtpoff(x1))         R_MIPS_TLS_LDO_LO16 x1
    0x18 daddu $3, $3, $2
         ...
    0x1c lui $3, %hi(%dtpoff(x2))               R_MIPS_TLS_LDO_HI16 x2
    0x20 addiu $3, $3, %lo(%dtpoff(x2))         R_MIPS_TLS_LDO_LO16 x2
    0x24 daddu $3, $3, $2

Outstanding relocations (32-bit mode):

    GOT[n]                                      R_MIPS_TLS_DTPMOD32 x1

After the instruction at |0x18| has executed, the address of |x1| is
available in |$3|.

If rather than needing the address of the variable, the variable is to
be read or written, then the following code sequences may be used
instead of either of the sequences above.

Code Sequence II (32-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
         ...
    0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
    0x14 addu $3, $3, $2
    0x18 lw $3, %lo(%dtpoff(x1))($3)            R_MIPS_TLS_LDO_LO16 x1

Code Sequence II (64-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
         ...
    0x10 lui $3, %hi(%dtpoff(x1))               R_MIPS_TLS_LDO_HI16 x1
    0x14 daddu $3, $3, $2
    0x18 lw $3, %lo(%dtpoff(x1))($3).           R_MIPS_TLS_LDO_LO16 x1

Here, |lw| may be replaced with any other load/store insturction, using
the same opcode format as |lw|, such as |lb|, |lbu|, |lh|, |lwu|, |ld|,
|sb|, |sh|, |sw|, |sd|, |ll|, |lld|, |lwl|, |lwr|, |ldl|, or |ldr|.

If the size of the TLS area is known to be smaller than 32K, then the
following sequences can be used instead of those above.

Code sequence III (32-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
         ...
    0x10 addiu $3, $2, %lo(%dtpoff(x1))         R_MIPS_TLS_LDO_LO16 x1

Code sequence III (64-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
         ...
    0x10 daddiu $3, $2, %lo(%dtpoff(x1))        R_MIPS_TLS_LDO_LO16 x1

The outstanding relocations are as for Code Sequence I.

If, rather than needing the address of the variable, the variable is to
be read or written, and the size of the TLS area is known to be smaller
than 32K, then the following code sequences may be used instead of
either of the sequences above.

Code sequence IV (32-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 addiu $4, $28, %tlsldm(x)              R_MIPS_TLS_LDM  x
         ...
    0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1

Code sequence IV (64-bit mode):

    0x00 lw $25, %call16(__tls_get_addr)($28)   R_MIPS_CALL16   g
    0x04 jalr $25
    0x08 daddiu $4, $28, %tlsldm(x)             R_MIPS_TLS_LDM  x
         ...
    0x10 lw $3, %lo(%dtpoff(x1))($2)            R_MIPS_TLS_LDO_LO16 x1

Here, |lw| may be replaced with any other load/store insturction, as above.


    Linker Optimizations

Not yet specified.

Additional relocations may be required to mark instructions that the
linker can transform.


    ELF Definitions

New relocations:

    #define R_MIPS_TLS_DTPMOD32         38
    #define R_MIPS_TLS_DTPOFF32         39
    #define R_MIPS_TLS_DTPMOD64         40
    #define R_MIPS_TLS_DTPOFF64         41
    #define R_MIPS_TLS_GD               42
    #define R_MIPS_TLS_LDM              43
    #define R_MIPS_TLS_LDO_HI16         44
    #define R_MIPS_TLS_LDO_LO16         45