C++ ABI Closed Issues

Revised 17 November 2000


Issue Status

In the following sections, the class of an issue attempts to classify it on the basis of what it likely affects. The identifiers used are:
call Function call interface, i.e. call linkage
data Data layout
lib Runtime library support
lif Library interface, i.e. API
g Potential gABI impact
ps Potential psABI impact
source Source code conventions (i.e. API, not ABI)
tools May affect how program construction tools interact


A. Object Layout Issues

# Issue Class Status Source Opened Closed
A-1 Vptr location data closed SGI 990520 990624
Summary: Where is the Vptr stored in an object (first or last are the usual answers).

[990610 All] Given the absence of addressing modes with displacements on IA-64, the consensus is to answer this question with "first."

[990617 All] Given a Vptr and only non-polymorphic bases, which (Vptr or base) goes at offset 0?

Tentative decision: Vptr always goes at beginning.

[990624 All] Accepted tentative decision. Rename, close this issue, and open separate issue (B-6) for Vtable layout.

# Issue Class Status Source Opened Closed
A-2 Virtual base classes data closed SGI 990520 990624
Summary: Where are the virtual base subobjects placed in the class layout? How are data member accesses to them handled?

[990610 Matt] With regard to how data member accesses are handled, the choices are to store either a pointer or an offset in the Vtable. The concensus seems to be to prefer an offset.

[990617 All] Any number of empty virtual base subobjects (rare) will be placed at offset zero. If there are no non-virtual polymorphic bases, the first virtual base subobject with a Vpointer will be placed at offset zero. Finally, all other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.

[990624 All] Define an empty object as one with no non-static, non-empty data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes. Define a nearly empty object as one which contains only a Vptr. The above resolution is accepted, restated as follows:

Any number of empty virtual base subobjects (rare, because they cannot have virtual functions or bases themselves) will be placed at offset zero, subject to the conflict rules in A-3 (i.e. this cannot result in two objects of the same type at the same address). If there are no non-virtual polymorphic base subobjects, the first nearly empty virtual base subobject will be placed at offset zero. Any virtual base subobjects not thus placed at offset zero will be allocated at the end of the class, in left-to-right, depth-first declaration order.

# Issue Class Status Source Opened Closed
A-3 Multiple inheritance data closed SGI 990520 990701
Summary: Define the class layout in the presence of multiple base classes.

[990617 All] At offset zero is the Vptr whenever there is one, as well as the primary base class if any (see A-7). Also at offset zero is any number of empty base classes, as long as that does not place multiple subobjects of the same type at the same offset. If there are multiple empty base classes such that placing two of them at offset zero would violate this constraint, the first is placed there. (First means in declaration order.)

All other non-virtual base classes are laid out in declaration order at the beginning of the class. All other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.

The above ignores issues of padding for alignment, and possible reordering of class members to fit in padding areas. See issue A-9.

[990624 All] There remains an issue concerning the selection of the primary base class (see A-7), but we are otherwise in agreement. We will attempt to close this on 1 July, modulo A-7.

[990701 All] This issue is closed. A full description of the class layout can be found in issue A-9. (At this time, A-7 remains to be closed, waiting for the Taligent rationale.)

# Issue Class Status Source Opened Closed
A-4 Empty base classes data closed SGI 990520 990624
Summary: Where are empty base classes allocated? (An empty base class is one with no non-static data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes.)

[990624 All] Closed as a duplicate of A-3.

# Issue Class Status Source Opened Closed
A-5 Empty parameters data closed SGI 990520 001117
Summary: When passing a parameter with an empty class type by value, what is the convention?
Resolution : Except for cases of non-trivial copy constructors (see C-7), and parameters in the variable part of varargs lists, A single parameter slot will be allocated to empty parameters, as though they were a struct containing a single character.

[990623 SGI] We propose that no parameter slot be allocated to such parameters, i.e. that no register be used, and that no space in the parameter memory sequence be used. This implies that the callee must allocate storage at a unique address if the address is taken (which we expect to be rare).

[990624 All] In addition to the address-taken case, care is required if the object has a non-trivial copy constructor. HP observes that in (some?) such cases, they perform the construction at the call site and pass the object by reference.

[990625 SGI -- Jim] I understand that the Standard explicitly allows elimination of even non-trivial copy construction in some cases. Is this one of them? Where should I look? Also, of course, varargs processing for elided empty parameters would need to be careful.

I have opened a new issue (C-7) for passing copy-constructed parameters by reference. Since doing so would turn an empty value parameter into a non-empty reference parameter, this issue can ignore such cases.

[990701 All] An empty parameter will not occupy a slot in the parameter sequence unless:

  1. its type is a class with a non-trivial copy constructor; or
  2. it corresponds to the variable part of a varargs parameter list.

Daveed and Matt will pursue the question of when copy constructors may be ignored for parameters with the Core committee, and if they identify cases where the constructors may clearly be omitted, those (empty) parameters will also be elided.

[001109 CodeSourcery -- Mark] Both g++ and the HP compiler have great difficulty dealing with this, and prefer to reserve the parameter slot even for empty parameters. At the meeting, we tentatively decided to reverse our decision and allocate an integer parameter slot even for empty parameters. We will place no constraints on the data in the parameter slot, except that on IA-64, it must be not be NaT data.

[001117 All -- Jim] There having been no objection to the proposed resolution, it is adopted. Results will be treated the same way.

# Issue Class Status Source Opened Closed
A-6 RTTI .o representation data call ps closed SGI 990520 991028
Summary: Define the data structure to be used for RTTI, that is:
  • for user type_info calls;
  • for dynamic_cast implementation; and
  • for exception-handling.
Resolution: Defined in the Draft C++ ABI for IA-64.

[990701 All] Daveed will put together a proposal by the 15th (action #13); the group will discuss it on the 22nd.

[990805 All] Daveed should have his proposal together for discussion. Michael Lam will look into the Sun dynamic cast algorithm.

It was noted that appropriate name selection along with the normal DSO global name resolution should be sufficient to produce a unique address for each class' RTTI struct, which address would then be a suitable identifier for comparisons.

[990812 Sun -- Michael] Sun has provided a description, in a separate page, describing their implementation. They are filing for a patent on the algorithms described.

[990819 EDG -- Daveed] (Proposal replaced by later version on 6 October.)

[990826 All] Discussion centered on whether the representation should include all base classes or just the direct ones, and in the former case how hashing might be handled. It was agreed that the __qualifier_type_info variant is not needed, and it is now striken in the above proposal. Also, a pointer-to-member variant is needed. Christophe will provide a description of the HP hashing approach, and Daveed will update the specification.


[991006 EDG -- Daveed]

Run-time type information

The C++ programming language definition implies that information about types be available at run time for three distinct purposes:

  1. to support the typeid operator,
  2. to match an exception handler with a thrown object, and
  3. to implement the dynamic_cast operator.
(c) only requires type information about polymorphic class types, but (a) and (b) may apply to other types as well; for example, when a pointer to an int is thrown, it can be caught by a handler that catches "int const*".

Deliberations

The following conclusions were arrived at by the attending members of the C++ IA-64 ABI group:

The full proposal has been incorporated in the Draft C++ ABI for IA-64.


[991014 all]

  1. Do we keep pointers to direct bases only, or to indirect bases as well? It is believed that keeping pointers to indirect bases speeds up dynamic_cast by a constant factor, but at the cost of extra space even when dynamic_cast is never used. There is a general preference for keeping direct bases only.

  2. The current proposal has a flag to differentiate single inheritance from multiple inheritance case. Jason suggests instead splitting the two cases into two separate classes, and there was general agreement that this is a good idea.

  3. The current proposal has separate classes for various kinds of non-class types. Jason suggests merging all non-class types into a single class. Nobody had strong feelings, or strong arguments either for or against this change. In the absence of a consensus in favor of this change, we'll keep the proposal as is.

  4. Minor changes: There's a typo in the pointer to member part, which Daveed will fix. Jason suggests flipping the sign on the offset, and nobody objected.

ACTION ITEMS: Daveed---make these changes. Jim---incorporate these changes into the open issues list. We are almost ready to close this issue; we intend to close it at the 28 October meeting, after we've all had a change to go over the modified writeup.


[991028 all] The current definition, in the Draft C++ ABI for IA-64, has been updated with Daveed's changes, and is accepted. Note that we are back to using a pointer to RTTI in the vtable (see B-8), since we need uniqueness, and since we need an external symbol in any case, the ABI will make no statement about where RTTI is allocated. It is likely that implementations will use COMDAT for it.

# Issue Class Status Source Opened Closed
A-7 Vptr sharing with primary base class data closed HP 990603 990729
Summary: It is in general possible to share the virtual pointer with a polymorphic base class (the primary base class). Which base class do we use for this?
Resolution: Share with the first non-virtual polymorphic base class, or if none with the first nearly empty virtual base class.

[990617 All] It will be shared with the first polymorphic non-virtual base class, or if none, with the first nearly empty polymorphic virtual base class. (See A-2 for the definition of nearly empty.)

[990624 All] HP noted that Taligent chooses a base class with virtual bases before one without as the primary base class), probably to avoid additional "this" pointer adjustments. SGI observed that such a rule would prevent users from controlling the choice by their ordering of the base classes in the declaration. The bias of the group remains the above resolution, but HP will attempt to find the Taligent rationale before this is decided.

[990729 All] Close with the agree resolution. If a convincing Taligent rationale is found, we can reconsider.

# Issue Class Status Source Opened Closed
A-8 (Virtual) base class alignment data closed HP 990603 990624
Summary: A (virtual) base class may have a larger alignment constraint than a derived class. Do we agree to extend the alignment constraint to the derived class? (An alternative for virtual bases: allow the virtual base to move in the complete object.)

[990623 SGI] We propose that the alignment of a class be the maximum alignment of its virtual and non-virtual base classes, non-static data members, and Vptr if any.

[990624 All] Above proposal accepted. (SGI observation: the size of the class is rounded up to a multiple of this alignment, per the underlying psABI rules.)

# Issue Class Status Source Opened Closed
A-9 Sorting fields as allowed by [class.mem]/12 data closed HP 990603 990624
Summary: The standard constrains ordering of class members in memory only if they are not separated by an access clause. Do we use an access clause as an opportunity to fill the gaps left by padding?
Resolution: See separate writeup of Draft C++ ABI for IA-64.

[990610 all] Some participants want to avoid attempts to reorder members differently than the underlying C struct ABI rules. Others think there may be benefit in reordering later access sections to fill holes in earlier ones, or even in base classes.

[990617 all] There are several potential reordering questions, more or less independent:

  1. Do we reorder whole access regions relative to one another?
  2. Do we attempt to fill padding in earlier access regions with initial members from later regions?
  3. Do we fill the tail padding of non-POD base classes with members from the current class?
  4. Do we attempt to fill interior padding of non-POD base classes with later members?

There is no apparent support for (1), since no simple heuristic has been identified with obvious benefits. There is interest in (2), based on a simple heuristic which might sometimes help and will never hurt. However, it is not clear that it will help much, and Sun objects on grounds that they prefer to match C struct layout. Unless someone is interested enough to implement and run experiments, this will be hard to agree upon. G++ has implemented (3) as an option, based on specific user complaints. It clearly helps HP's example of a base class containing a word and flag, with a derived class adding more flags. Idea (4) has more problems, including some non-intuitive (to users) layouts, and possibly complicating the selection of bitwise copy in the compiler.

[990624 all] We will not do (1), (2), or (4). We will do (3). Specifically, allocation will be in modified declaration order as follows:

  1. Vptr if any, and the primary base class per A-7.
  2. Any empty base classes allocated at offset zero per A-3.
  3. Any remaining non-virtual base classes.
  4. Any non-static data members.
  5. Any remaining virtual base classes.
Each subobject allocated is placed at the next available position that satisfies its alignment constraints, as in the underlying psABI. This is interpreted with the following special cases:
  1. The "next available position" after a non-POD class subobject (base class or data member) with tail padding is at the beginning of the tail padding, not after it. (For POD objects, the tail padding is not "available.")
  2. Empty classes are considered to have alignment and size 1, consisting solely of one byte of tail padding.
  3. Placement on top of the tail padding of an empty class must avoid placing multiple subobjects of the same type at the same address.
After allocation is complete, the size is rounded up to a multiple of alignment (with tail padding).

[990722 all] The precise placement of empty bases when they don't fit at offset zero remained imprecise in the original description. Accordingly, a precise layout algorithm is described in a separate writeup of Data Layout.

[990729 all] The layout writeup was accepted, with the first choice for empty base placement. That is, if placement at offset zero doesn't work, it will be placed like a normal base/member. The concensus was that this won't happen often, and such bases will often overlap with the preceding tail padding or following components anyway. Jim will modify the writeup accordingly.

# Issue Class Status Source Opened Closed
A-10 Class parameters in registers call closed HP 990603 990710
Summary: The C ABI specifies that structs are passed in registers. Does this apply to small non-POD C++ objects passed by value? What about the copy constructor and this pointer in that case?

[990701 all] A separate issue (C-7) deals with cases where a non-trivial copy constructor is required; we ignore those cases here. Our conclusion is that, without a non-trivial copy constructor, we need not be concerned about the class object moving in the process of being passed, and there is no need to use a mechanism different from the base ABI C struct mechanism. At the same time, if we do use the underlying C struct mechanism, the user has complete control of the passing technique, by choosing whether to pass by value or reference/pointer.

Therefore, except in cases identified by issue C-7 for different treatment, class parameters will be passed using the underlying C struct protocol.

# Issue Class Status Source Opened Closed
A-11 Pointers to member functions data closed Cygnus 990603 990812
Summary: How should pointers to member functions be represented?
Resolution: As a pair of values, described below.

[990729 All] Jason described the g++ implementation, which is a three-member struct:

  1. The adjustment to this.
  2. The Vtable index plus one of the function, or -1. (Zero is a NULL pointer.)
  3. If (2) is an index, the offset from the full object to the member function's Vtable. If -1, a pointer to the function (non-virtual).

A concern about covariant returns was raised. It was observed that, given our decision to use distinct Vtable entries for distinct return types, no further concern is required here. Others will describe their representations. IBM has an alternative, but it is believed to be patented by Microsoft.

[990805 All] It is agreed that a two-element struct will be used for a pointer to a member function, with elements as follows:

ptr:
For a non-virtual function, this field is a simple function pointer. (Under current base IA-64 psABI conventions, this is a pointer to a GP/function address pair.) For a virtual function, it is 1 plus twice the Vtable offset of the function. The value zero is a NULL pointer.

adj:
The required adjustment to this.

Although we agreed to close this, SGI suggests a minor modification. Since the Vtable offset of a virtual function will always be even, we suggest that it not be doubled before adding 1. This is because shifts are more restricted on many processors than other integer ALU operations (shifters are large structures), so an XOR or NAND will often be cheaper than a right shift.

[990812 All] Close this issue with the suggested modification.

# Issue Class Status Source Opened Closed
A-12 Merging secondary vtables data closed Sun 990610 990805
Summary: Sun merges the secondary Vtables for a class (i.e. those for non-primary base classes) with the primary Vtable by appending them. This allows their reference via the primary Vtable entry symbol, minimizing the number of external symbols required in linking, in the GOT, etc.
Resolution: Concatenate the Vtables associated with a class in the same order that the corresponding base subobjects are allocated in the object.

[990701 Michael Lam] Michael will check what the Sun ABI treatment is and report back.

[990729 All] A separate issue raised in conjunction with A-7 is whether to include Vfunc pointers in the primary Vtable for functions defined only in the base classes and not overridden. If the primary and secondary Vtables are concatenated, this is no longer an issue, since all can be referenced from the primary Vptr.

[990805 All] All of the Vtables associated with a class will be concatenated, and a single external symbol used (to be identified as part of the mangling issue F-1). The order of the tables will be the same as the order of base class subobjects in an object of the class, i.e. first the primary Vtable, then the non-virtual base classes in declaration order, and finally the virtual base classes in depth-first declaration order.

# Issue Class Status Source Opened Closed
A-13 Parameter struct field promotion call closed SGI 990603 990701
Summary: It is possible to pass small classes either as memory images, as is specified by the base ABI for C structs, or as a sequence of parameters, one for each member. Which should be done, and if the latter, what are the rules for identifying "small" classes?
Resolution: No special treatment will be specified by the ABI.

[990701 all] Define no special treatment for this case in the ABI. A translator with control over both caller and callee may choose to optimize.

# Issue Class Status Source Opened Closed
A-14 Pointers to data members data closed SGI 990729 990805
Summary: How should pointers to data members be represented?
Resolution: Represented as one plus the offset from the base address.

[990729 SGI] We suggest an offset from the base address of the class, represented as a ptrdiff_t.

[990805 All] Such pointers are represented as one plus the offset from the base address of the class, as a ptrdiff_t. NULL pointers are zero.

# Issue Class Status Source Opened Closed
A-15 Empty bit-fields data closed CodeSourcery 991214 000106
Summary: How are zero-length bit-fields handled?
Resolution: Zero-length bit-fields do not prevent a class from being considered empty or nearly empty.

[991214 CodeSourcery -- Mark]

Question: Does the presence of a zero-width bit-field prevent a class from being empty?

Suggested Resolution: No. Amend the definition of an "empty class" to read:

A class with no non-static data members other than zero-width bitfields, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes.

Amend the definition of a "nearly empty class" to read:

A class, the objects of which contain only a Vptr and zero-width bitfields.

[000106 All] Accept the CodeSourcery proposal.

# Issue Class Status Source Opened Closed
A-16 Nearly empty virtual bases data closed SGI 991228 000106
Summary: May a class with non-empty, non-primary, virtual base classes be treated as nearly empty (and thus eligible to be a primary base) if its only non-vptr data is in its virtual base classes?
Resolution: Virtual base classes do not prevent a class from being considered nearly empty.

[000106 All] Accept the proposal.

# Issue Class Status Source Opened Closed
A-17 Primary indirect virtual base allocation data closed SGI 991228 000113
Summary: When a nearly empty virtual base class A is allocated as the primary base class of class B, and then B is allocated as a base class of C, should A (i.e. its vptr) be separately allocated in C, or should its first occurrence in a previously allocated base B be used as its allocation in C?
Resolution: Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order.

[991228 SGI -- Jim] Specific wording for a proposed change is in the Draft C++ ABI for IA-64.

[000103 CodeSourcery -- Mark] I think the current proposal for allocating virtual bases is still a little suboptimal. In particular, given:

  struct A { void f(); };
  struct B : virtual public A { };
  struct C : virtual public A, virtual public B { };
we'll give `C' a larger size than for:
  struct C : virtual public B, virtual public A { };
where we'll reuse the `A' part of `B' rather than reallocating it.

I know that ordering can already affect size (principally because of alignment issues) but I think that in this case we might as well not punish programmers for choosing the "wrong" ordering.

I think we should change the green A-17 proposed resolution to indicate that if one of the virtual bases is a (direct or indirect) primary base of one of the other virtual bases then we need not allocate a fresh copy.

FWIW, it turns out to actually be easier in GCC to code the more generous version.

The algorithm to do this is linear in the size of the hierarchy: just iterate through the inheritance DAG marking all primary bases. Any virtual base classes that remain unmarked need to be allocated in step III. A slight formalization of this sentence might be a good way to express which bases to choose for III.


[000113 All] Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order.

# Issue Class Status Source Opened Closed
A-18 Virtual base alignment data closed SGI 991228 000113
Summary: Should virtual bases have a different effect on class alignment than other components?
Resolution: Yes. When allocating the non-virtual part of a base class, use its non-virtual allignment, i.e. ignoring its virtual bases' contributions.

[991228 SGI -- Jim] Since the allocation of virtual bases is "floating" relative to the classes in which they occur, it is possible for them to have independent alignment constraints. Specifically, when allocating a base class with a virtual base, we could treat its alignment as that obtained by ignoring the virtual base, and later allocate the virtual base with greater alignment.

Since the class with a virtual base already has a vptr, this only matters if the virtual base contains components more strictly aligned than a pointer. Thus, the benefit of doing so is probably not large. To get some idea of the effect on the layout definition, look at dsize and nvsize, and assume a similar pair of alignment values.

[000106 All] No strong opinions were expressed on this issue. We will decide it at the next meeting after people have a chance to think it over. The bias will be to keep the current simpler definition.

[000113 All] It turns out that both Compaq and someone else (Cygnus?) already do this, find it straightforward, and prefer to keep it. Therefore, accept the suggestion that when allocating the non-virtual part of a base class, we use its non-virtual allignment, i.e. ignoring its virtual bases' contributions.

# Issue Class Status Source Opened Closed
A-19 Primary indirect virtual base choice data closed All 000106 000120
Summary: In allocating class C, when the first nearly empty virtual base class A is allocated as the primary base class of a later nearly empty virtual base class B, should A or B become the primary base class of C?
Resolution: Do not use a virtual base as primary if it is already a primary base of some other direct or indirect base, unless such are the only candidates. In either case, use the first candidate in depth-first, left-to-right order in the inheritance graph.

[000106 All] This issue was initially confused in the discussion with A-17, but is independent. Recall that non-virtual bases have priority over virtual bases for selection as the primary base. Assuming that no non-virtual base is suitable, this issue involves which virtual base should be selected. Our original decision was to use the first in left-to-right order.

The proposal here is that, if this initial candidate A is itself already a primary base class of a later virtual base B, then B will be used instead, unless it is already a primary base class of a later virtual base, and so on. See proposed wording in the ABI layout document.

Noone can identify a case in which this approach is worse than the original definition.

[000113 All] The proposed resolution on the table is to use the following priority to choose the primary base class:

  1. The first (left-to-right declaration order) super-polymorphic non-virtual base class.
  2. The first (left-to-right declaration order) nearly empty virtual base class that is not a primary base class of any other base, direct or indirect.
  3. The first (left-to-right declaration order) nearly empty virtual base class.

[000113 All] Modify the above to use any virtual base in the inheritance graph, first one that is not already primary to some base if possible, or then any candidate, chosen as the first in a depth-first, left-to-right inheritance graph walk.

# Issue Class Status Source Opened Closed
A-20 Operator new array cookies data closed All 000113 000120
Summary: When operator new is used to create a new dynamic-length array, a cookie must be stored to remember the allocated length so that it can be deallocated correctly.
Resolution: In principle, place cookie immediately before array, aligned naturally. Use no cookie for array element types without destructors. See the Draft C++ ABI for IA-64.

[000113 All] The proposed resolution is as follows:

This resolution has the following consequences:

[000120 All] Accept the above.

# Issue Class Status Source Opened Closed
A-21 Placement new array cookies data closed All 000113 000217
Summary: Same issue as A-20, except that for placement new, the user supplies already-allocated space. Therefore, there is a conflict between wanting to make delete() work on arrays created in this way, and wanting to avoid surprising users who haven't allocated enough space for the cookie. Also, are cookies allocated if there is no destructor?
Resolution: Use no cookie for element types with no destructors, nor for ::operator new(size_t, void*). Otherwise, use a cookie as in issue A-20. See the Draft C++ ABI for IA-64.

[000119 SGI -- Matt]

What the standard says (3.7.3.1, 5.3.4, and 18.4.1.3)

Array placement new has the form "new(ARGS) T[n]". The "(ARGS)" part is optional. If it's present then this is a placement new-expression, and we use a version of operator new[] with two or more arguments, otherwise it's an ordinary new-expression, and we use a version of operator new[] with one argument. For the purposes of this proposal, the distinction isn't all that important.

After finding the appropriate operation new, a new-expression obtains storage with

void* p = operator new[](n1, ARGS),
where n1 >= n * sizeof(T). It then constructs n objects of type T starting at position p1, where p1 = p + delta. The return value is p1.

It is required (3.7.3.1/2) that the return value of any operator new[], whether it's built-in or provided by the user, must be suitably aligned for objects of any type.

If T is "char" or "unsigned char" the standard requires that delta is a nonnegative multiple of the most stringent alignment constraint for objects of size less than or equal to n (5.3.4/10). Otherwise the only restriction is that delta is nonnegative.

Some implementations store the number of elements in the array at a negative offset from p1. The standard neither requires nor forbids it.

There's a predefined placement version of array operator new,

::operator new[](size_t n1, void* p),
that does nothing but return p. p must be a pointer to the beginning of some array of size at least n1. The standard doesn't tell users how large an array they need. Many users probably assume that it's sufficient for the array to be of size n * sizeof(T), but there's no basis in the standard for that assumption.

IA-64 Specifics

On IA-64 long double is 80 bits. long double has 128-bit alignment, as do classes and unions containing long double, so sizeof(long double) is 16. All other types have at most 64-bit alignment.

What the abi needs to specify

  1. Given n, T, sizeof(T), and alignof(T), what are n1 and delta?
    1. Are T=char and T=unsigned char special cases? (Or, perhaps, is sizeof(T)=1 a special case?)
    2. Is ::operator new[](size_t, void*) a special case?
    3. Is ::operator new[](size_t), which is used for non-placement new, a special case?
    4. Is ::operator new[](size_t, const nothrow_t&) a special case? I can't find anything in the standard guaranteeing that you can delete an array allocated with nothrow array new using an ordinary array delete-expression, but users probably expect it, and legitimately so.

  2. Do we store n at a negative offset from the return value of operator new[]? (This affects the answer to question 1.) If so, we need to specify precisely what that offset is.

Proposal A

No version of operator new[] is a special case. For any array new-expression we store the number of elements in the array, as a size_t, at an offset of -sizeof(size_t) from the pointer returned by the new-expression. For any type T other than char, unsigned char, long double, or a type containing a long double, n1 = n * sizeof(T) + sizeof(size_t). For those three types, since we need to preserve long double alignment, n1 = n * sizeof(T) + sizeof(long double).

Pseudocode for new(ARGS) T[n] under this proposal:

    if T = char or unsigned char, or if it has long double alignment,
      padding = sizeof(long double)
    else
      padding = sizeof(size_t)

    p = operator new[](n * sizeof(T) + padding, ARGS)

    p1 = (T*) (p + padding)
    ((unsigned long*) p1 - 1) = n

    for i = [0, n)
      create a T, using the default constructor, at p1[i]

    return p1

Proposal B

::operator new[](size_t, void*) is a special case. For that version of operator new[] only, n1 = n * sizeof(T). We do not store the number of elements in such an array anywhere.

Pseudocode for new(ARGS) T[n] under this proposal:

    If the expression is new(p) T[n], and if overload resolution
    determines we're using ::operator new[](size_t, void*), then
      p1 = (T*) p

      for i = [0, n)
        create a T, using the default constructor, at p1[i]

      return p1

For all other cases, same as proposal A.

Proposal A is simpler, but proposal B probably conforms more closely to user expectations.


[000210 All -- Matt] We agreed that Proposal B, where ::operator new(size_t, void*) is a special case with no cookie, is preferable to Proposal A, where all versions of array new get cookies.

We also agreed to the variation where we don't reserve space for a cookie if the type has no destructor. We're calling it Proposal C. We need a writeup, but we should be able to close this issue next week.


[000302 CodeSourcery -- Mark] I believe the resolution to A-20/A-21, dealing with array new, is incorrect with respect to the C++ standard. (In other words, I think we'll make it impossible to implement the behavior required by the standard.)

In particular, there are situations in which we do not allocate cookies, even when allocating arrays of class type. But, the standard guarantees that:

[class.free]

When a delete-expression is executed, the selected deallocation function shall be called with the address of the block of storage to be reclaimed as its first argument and (if the two-parameter style is used) the size of the block as its second argument.)

That paragraph doesn't require that the class type have a non-trivial destructor.

I think that means the first bullet:

No cookie is required if the array element type T has a trivial destructor (C++ standard, 12.4/3).
should read:
No cookie is required if the array element type T has a trivial destructor ([class.dtor]) and the usual (array) deallocation function ([basic.stc.dynamic.deallocation]) function does not take two arguments.

(Note: if the usual array deallocation functions takes two arguments, then its second argument is of type size_t. The standard guarantees that this function will be passed the number of bytes allocated with the previous array new expression. See [class.free] for details.)


[000302 All] Modification accepted.

# Issue Class Status Source Opened Closed
A-22 RTTI for reference types data closed CodeSourcery 000119 000203
Summary: __reference_type_info does not appear to be necessary.
Resolution: Remove it.

[000119 CodeSourcery -- Nathan] When would a type_info of a reference ever be generated? (So why __ref_type_info?)

[000126 CodeSourcery -- Nathan]

[dcl.mptr] (8.3.3)/3
A pointer to member shall not point to ... a member with reference type

[000128 Cygnus -- Jason] Based on that, I definitely think reference type_info can go away.

[000203 All] Remove __ref_type_info.

# Issue Class Status Source Opened Closed
A-23 RTTI class descriptors data closed CodeSourcery 000124 000302
Summary: Resolve several questions about the RTTI representation of class types.
Resolution: See the Draft C++ ABI for IA-64.

[000124 CodeSourcery -- Nathan] si_class_type_info is for a single nonvirtual inheritance heirarchy. Presumably this single non-virtual inheritance is between the derrived and the base (the base may or may not have multiple or virtual bases). An additional constraint is that, if the derrived class is polymorphic, the base class is too. Rationale: if the derrived class adds polymorphism, the base will be at a non-zero offset.

[000126 CodeSourcery -- Nathan] More useful for dynamic cast (and possibly catch matching) {than the current set of flags -- editor} would be the following flags:

Note that the virtual/non-virtual and public/non-public are not mutually exclusive. Also note that I have not actually implemented anything with these flags, so I could be wrong.

[class.mi] (clause 10.1) provides good examples of "diamond shaped." Paragraph 4 gives a non-diamond shaped graph with multiple base object. At least one of the multiply inherited base objects must be non-virtual.

        struct L {};
        struct A : L {};
        struct B : L {};
        struct C : A, B {};

There are two distinct L base objects in C. C would have the non-diamond shaped multiple inheritance flag set. A, B and C would have the non-virtual base flag and public base flag set.

Paragraph 5 gives a diamond shaped graph. Such a multiply inherited base object must be virtual.

        struct V {};
        struct A : virtual V {};
        struct B : virtual V {};
        struct C : A, B {};

This time C would have the diamond shaped flag set. A, B & C would have the virtual base flag set and the public base flag set. C would also have the non-virtual base flag set.

Paragraph 6 gives a graph which contains both features. Here there is one non-virtual base and one virtual base.

        struct B {};
        struct X : virtual B {};
        struct Y : virtual B {};
        struct Z : B {};
        struct AA : X, Y, Z {};

In that example, AA would have both diamond and non-diamond flags set. all would have the public base flag set, AA & Z would have the non-virtual base flag set, AA, X & Y would have the virtual base flag set.

The above is treating the non-virtual and virtual base flags differently, they should have the following meaning:

Similarly the public and non-public flags mean:

My thinking is that for dynamic_cast, having such information will allow pruning parts of the inheritance graph walk. For instance, there can only be distinct multiple target base objects when the non-diamond shaped flag is set in the complete object. When we find them, the base sub-object started from can only be a common base for both of them, if the diamond shaped flag is set in the complete object. Alternatively, there can only be (at most) one instance of the target type when the non-diamond shaped flag is clear. When we find it via a non-public path, there could only be an alternative public path if the complete object has the diamond shaped flag set. Similar pruning should be possible for catch matching. Without such information, the graph walk has to be pessimistic, which I beleive will slow down the common case.

[000126 CodeSourcery -- Nathan] __si_class_type_info is documented for a single non-virtual hierarchy, and __vmi_class_type_info for a class containing (directly or indirectly) a multiple or virtual inheritance component. My mistake was to use __si_class_type_info for a class with a single base, regardless of the heirachy within the base (that is the current g++ behaviour).

__si_class_type_info is for both public and non-public inheritance (again, something I'd not noticed, thinking it was for public only). For this to work, the __class_type_info flag bit 0x8 'non-publicly inherited base' must mean `non-publicly inherited direct base'. Please can the wording about bases here explicitly say `direct base,' `indirect base,' or `direct or indirect base.' The description currently use `contains' and `has' which are open to interpretation.

In dynamic casting, access is important. In a cross cast from base A via complete type C to another base B, both B and A must be publicly accessible from C. It might be that dynamic_cast locates B, and, knowing that C does not have multiply inherited subobjects, determines it need look no further. However, it must determine access. If C has no non-public direct or indirect bases, access must be OK, without further inspection. However the hint flag 0x8 can't be indicating that, as it is only for direct bases. (This was the one case where I was able to take advantage of these flags, but alas it seems I can't.)

[000127 All] We decided on Thursday that your "mistakes" are what we want. __si_class_type_info will be for any class with a single direct base at offset 0 which is public and non-virtual.

We also decided that the flags should move from __class_type_info into __vmi_class_type_info, and that the polymorphic flag should be removed.

[000126 CodeSourcery -- Nathan] I think this moving of the flags is a mistake. If I understood correctly, they indicated information about direct and indirect bases (whether there was virtuality anywhere in the heirarchy for instance). Such information can speed up dynamic cast. When walking the inheritance graph, we can take some early outs, if we know there are no multiple subobject types within the complete graph. With the flags in every class's type_info, it becomes easier to get hold of that info. With it only for vmi classes, we have to remember `unknown' when presented with a complete object of si type, and fill the information in when/if we find a vmi base.

Another case is in a potential cross-cast case, which I had in the previous email. Suppose we've found the target base, which we know is unique, but not found the source base (because we early outed, maybe). To be a valid cross-cast both the source and target base objects must be public in the complete object. If we know the complete heirarchy has no non-public bases, there's no need to search for the source base in this case.


[000129 Cygnus -- Jason] So what you're saying is if we try to dynamic_cast from A* to B*, where B has a unique A subobject and the A* does not actually point to part of a B, if we know that B has no multiple subobjects we can check the passed offset, see that it doesn't match, and return failure. Without that information, we would have to recurse up the single-inheritance chain until we either reach the A or a class with multiple or virtual bases.

I think I'd rather pay that small performance hit than add a word to the type_info for each class. Matt, would this affect locales?

... cross-casts only come up in the context of classes with multiple bases, so it wouldn't make sense to look for this in single inheritance classes anyway.


[000127 All] Note from the meeting: A proposed precise definition of a diamond-shaped object is one that has two different direct bases with the same virtual base, directly, indirectly, or vacuously (the direct base is the virtual base).


[000203 All] Move the flags from __class_type_info to __vmi_class_type_info. Share them with one byte from the __base_class_info offset field. Replace Daveed's set with Nathan's, but the first one isn't needed.


[000203 SGI -- Jim] The class type restructuring is a bit different than what I expected going in (could just be my confusion).

I moved the flags from __class_type_info to __vmi_class_type_info, discovering that they don't need to share space with the offset field in the __base_class_info records, but rather with the base class count. But, the __base_class_info has its own flags (virtual and public) which can reasonably share a doubleword, as we were discussing for the other flags this morning. So I specified that. Note that I put the flags in the low byte rather than the high byte. That is because the offset is signed, and it is likely that implementations will sign-extend (signed doubleword>>8), but not (doubleword & 0x00ffffffffffffffll).

After an exchange with Nathan, I reinstated his first flag (contains non-diamond multiple inheritance).


[000210 All -- Matt] Notes from the meeting:

Minor corrections to RTTI discussion in data layout document: In section 7c, which describes the vmi_flags, flag 0x01 is documented incorrectly. It says "class has non-diamond multiple inheritance", which isn't quite right. We're really talking more about repeated inheritance: having multiple subobjects of the same type.

Also in vmi_flags, Jason questions whether flags 0x04 and 0x08 are necessary. What do we really need "has virtual base(s)" and "has non-virtual base(s)" for? Jason has sent email to Nathan about this.

Naming issue: we decided to put all of our type_info subclasses in namespace abi, not namespace std. This means, of course, that they can't go in any of the standard headers. Rather than inventing multiple header names, we would like to put everything (unwinding longjmp, type_info subclasses, etc.) into one quasi- standard header. We propose the name . Everything in that header will be in namespace abi.

Issue A23 can almost be closed. The only thing we need to resolve is whether to keep the two flags that Jason is unsure about.


[000302 All -- Matt] We will tentatively keep the has-public-base flag. Nathan has an action item to validate its usefullness when he implements.

# Issue Class Status Source Opened Closed
A-24 RTTI for incomplete types data closed CodeSourcery 000126 000330
Summary: How does RTTI represent incomplete types?
Resolution: Use class_type_info distinct from the complete type copy, add a flag to pointer_type_info if it points to incomplete type RTTI, and do mangled name comparison if an incomplete pointer is involved.

[000126 CodeSourcery -- Nathan] The amended (25th Jan) RTTI specification says:

Note that the full structure described by an RTTI descriptor may include incomplete types not required by the Standard to be completed, although not in contexts where it would cause ambiguity.

I don't believe this is the case, the example I posted a couple of weeks back pointed this out. Here it is, in a slightly more compact form

        struct A;
        struct B;

        int main ()
        {
          try {
            throw (B **)0;
          } catch (A const * const *) {
            abort ();
          } catch (B const * const *) {
            ;//ok
          } catch (...) {
            abort ();
          }
        }

I believe this is well formed and should not abort. The RTTI document indicates that `typeid (A const * const *)' and `typeid (B const * const *)' will produce __pointer_type_info chains that end at a weak symbol reference for A and B respectively. These will both resolve to zero. How is catch matching able to determine the difference between `A const * const *' and `B const * const *' under these circumstances? If this is a shortcoming of the ABI, or considered a defect in the standard, it should be documented.

There seems to be no discussion of this case.


[000127 All] We decided on Thursday that this can be handled by not emitting info for A and B, just referring to them using weak references. The EH matcher will never look past the inner pointers.


[000128 CodeSourcery -- Nathan] I'm sorry, I'm just not getting this. The type_infos for `B **' and `B *' will be, (I'm using g++'s existing name mangling, but these are new-abi structures):

__tiPP1B:
        .long   __vt_19__pointer_type_info
        .long   .LC2
        .long   0
        .long   __tiP1B

__tiP1B:
        .long   __vt_19__pointer_type_info
        .long   .LC3
        .long   0
        .long   __ti1B  ;; not emitted, will resolve to zero

In the catch matching, the type_infos for `A const *const *' and `A const *' will be:

__tiPCPC1A:
        .long   __vt_19__pointer_type_info
        .long   .LC1
        .long   1
        .long   __tiPC1A

__tiPC1A:
        .long   __vt_19__pointer_type_info
        .long   .LC4
        .long   1
        .long   __ti1A ;; not emitted, will resolve to zero

and those for `B const *const *' and `B const *':

__tiPCPC1B:
        .long   __vt_19__pointer_type_info
        .long   .LC0
        .long   1
        .long   __tiPC1B

__tiPC1B:
        .long   __vt_19__pointer_type_info
        .long   .LC5
        .long   1
        .long   __ti1B ;; not emitted, will resolve to zero

I fail to see how the catch matcher can get different results comparing __tiPP1B to __tiPCPC1A as opposed to comparing __tiPP1B to __tiPCPC1B. They both look like qualification conversions of pointers to pointers to incomplete type. In the first case we'll end up comparing __tiP1B to __tiPC1A, which still is a valid qualification conversion, then have two NULL pointers for the pointed to types, which somehow we have to tell apart. In the second case we'll end up comparing __tiP1B to __tiPC1B, and again have two NULL pointers for the pointed to types, but this time we have to consider them the same type. I don't see anything in [conv.qual] saying that qualification conversions don't have to deal with incomplete types. N.B.: old-abi g++ seg faults on the above code because it does wander into the NULL pointers.


[000129 Cygnus -- Jason] Good point. I was forgetting about multi-level qualification conversions.

I think that leaves us with something like what EDG does now: namely, comparisons are done by comparing the addresses of one-byte commons rather than of the type_info nodes themselves. Then we could emit incomplete info in one file and complete info in another file and they would compare the same because both refer to the same ID proxy.

We could mangle the complete and incomplete versions differently, so they would not be combined by the linker.

This would also change how we refer to type_infos; under the current scheme, references to type_infos in the EH type table need to be via relocs that will be resolved by the dynamic linker at runtime. If we don't need to compare addresses, we could use gp-relative references. Of course, we'd still have the absolute references in the type_infos to the ID proxies, so we're no better off.


[000130 CodeSourcery -- Nathan] There's a bit of strangeness with loading & unloading a DSO which contains the complete definition of `struct A', into an executable which has the incomplete info. That too is in the original email. If both DSO and executable have __tiP1A (struct A *), they'll be merged, presumably with the DSO's copy ignored. However, the __tiP1A in the executable will point at the proxy incomplete A type_info (which will have already been filled with a weak NULL for its target). Somehow we have to arrange that the proxy is altered to now point at the __ti1A (struct A) type_info that the DSO supplied. If we don't do that, throwing `struct A *' in the DSO (which is valid, `cos the DSO source had complete information), will throw the __tiP1A in the executable which points to incomplete. Hence we wont find any base conversions if we're trying to catch a base of A.


[000203 All] We can't seem to get around the need for an EDG-style implementation, i.e. a proxy for the type RTTI which is resolved by name, e.g. a one-byte common block referenced from the RTTI. We need a specific proposal for putting the reference in the RTTI, and a mangling for the name.

Since all we need from the common block is a distinct address, we may want to float a base ABI proposal for a new symbol type which is resolved by the linkers to a unique address without allocating storage.


[000210 All -- Matt] The scheme we have been converging on: we extend __class_type_info by putting in a new field, id_proxy_ptr, of type char*. It points to a one-byte comdat which serves only as a unique address. (We don't see a strong need to ask the base ABI group to mandate a magic unique-address feature in the linker. We may want to get input from our linker people, though.)

A class's __class_type_info object and its comdat proxy both receive mangled names. We must make sure that the proxy's mangled name is the same for all complete and incomplete declarations of a class, that the mangled name of the __class_type_info object is the same for all complete declarations of a class, and that the mangled name of the __class_type_info object is different for incomplete declarations than for complete declarations. One way to achieve this is to make __class_type_info objects for incomplete declarations static.

We add a new flag to __pointer_type_info; let's say bit 0x4. If this is set, it means we have a pointer to an incomplete type (or pointer to pointer to incomplete type, etc.)

We compare two __class_type_infos for equality by pointer comparison of the id_proxy_ptr fields. We compare two __pointer_type_infos for equality by looking at the addresses of the type_info objects, *unless* the incomplete bit is set in at least one of them. If the incomplete bit is set, we have to compare the pointed-to types. For everything other than classes and pointers we can just use address equality of the type_info objects themselves.

In response to Jason's 000129 question: we can't use gp-relative references for type_info objects because we're only using comdat proxies for __class_type_info, not for other kinds of type_info objects.

In response to Nathan's 000130 question: this is the reason to give the complete and incomplete __class_type_info objects different mangled names. That way a complete __class_type_info object in a DSO won't be overridden by an incomplete __class_type_info object in the executable.

At the very end of this meeting we got a suggestion from Christophe for a complete different mechanism. We agreed that we can't evaluate it without a writeup. The suggestion: abandon these comdat proxies altogether. Instead we have a new type_info class, __incomplete_class_type_info. Comparisons involving two __class_type_info objects use address equality, comparisons involving two __incomplete_class_type_info objects, or a __class_type_info and an __incomplete_class_type_info, do string comparison on the name. We still would have an incomplete bit in the __pointer_type_info class, which, again, we would use to determine whether two __pointer_type_info objects with different addresses might nevertheless represent the same pointer type.


[000309 All] The group decided to go ahead and close this issue with the proxy solution. If Christophe comes up with a writeup of the alternate proposal, we can reopen.


[000314 SGI -- Jim] I've incorporated the chosen scheme into the Draft C++ ABI for IA-64. In working this out, though, I've remembered why SGI had an issue with the proxy commons, which is that, in large programs with lots of class types, they produce a lot of runtime relocation scattered through data. Matt and I think we understand the representation of Christophe's proposal, and will think about how to compare the mangled names.


[000330 All] Adopt the proposed scheme. Make sure Nathan understands it.

# Issue Class Status Source Opened Closed
A-25 Excess-width bitfields data closed IBM 000204 000217
Summary: C++ allows bitfields with a larger size specified than that required by the declared type, e.g. int f: 64. How should they be allocated?
Resolution: Allocate the field with alignment determined as though it were the largest integer type that fits in the specified size, and use the first bits available in the field (lowest order for little endian IA-64) for the actual data.

When the specified width of a bitfield exceeds the size of the declared type, the standard specifies that the accessible field is to be padded to the specified width, with the location of the padding implementation-defined. That is, the accessible field could be placed at the beginning, at the end, or in the middle of the specified bits. (Note that such declarations are explicitly disallowed by the C 2000 draft, so this is not a C ABI issue.)

[000204 SGI -- Jim] It seems to me that the situation that makes it interesting is the following:

        struct s {
          short s1;
          int i: 64;
          short s2;
        }
In this case, I don't want the accessible part of i at the beginning or the end -- I want it in the middle. Doing otherwise yields either a badly aligned i, or wasted space.

One could express this by the following rule:

Place the accessible part of the bitfield object as if it were a non-bitfield member of the declared type, i.e. at the next available offset of the appropriate alignment. Allocate the full bitfield at the earliest available offset where it will include the accessible part.
[000204 IBM -- Mark] I disagree. If the user wants the bitfield to be aligned in a certain place, he has the tools to do so. He can certainly pick a different size bitfield. I think that this should be aligned as if it is the same size as the type, and then the extra bits put somewhere. Putting them afterwards is probably simpler than before, or splitting it in the middle. [000217 All] The rationale for the solution chosen is that the most likely reason for using this feature is to achieve a known allocation for an enum type when the user does not know how big compilers will make it. Thus, we want "enum ... e : 32;" to behave as though the compiler allocated a 32-bit int, even if it actually uses only 8 bits for the enum value.

# Issue Class Status Source Opened Closed
A-26 NULL pointers to member functions data closed CodeSourcery 000221 000302
Summary: How are NULL pointers to member functions represented?
Resolution: A NULL pointer is represented by a 0 value of ptr, and the value of adj is irrelevant.

[000221 CodeSourcery -- Mark] The ABI document says that a NULL pointer-to-member function has `ptr == 0'. It does, not, however say whether or not a NULL pointer-to-member function also has `adj == 0'.

I believe that this should be specified as well so that code generated to do comparison of pointers to members (of the same type) looks like:

p1->ptr == p2->ptr && p1->adj == p2->adj
and not:
p1->ptr == p2->ptr && (!p1->ptr || (p1->adj == p2->adj))

So, I would say:

If the pointer-to-member is NULL, both fields are zero. (Note: there are no non-NULL pointers-to-members for which the `ptr' field is non-zero.)

It's occurred to me that this imposes some overhead on casting pointers-to-members around: now when you convert from a base pointer to member to a derived version (or vice versa), you can't just adjust the `adj' member willy-nilly; instead, you have to check first whether or not the pointer is NULL.

So, I'm not sure any more which scheme is preferable -- but we definitely need to say clearly which we want.

[000222 CodeSourcery -- Mark] So, it would be helpful if we were to add:

(Note: the `adj' field is not necessarily zero even when the pointer-to-member is NULL. Therefore, casting a pointer-to-derived-member to a pointer-to-base-member (or vice versa) requires only an adjustment to the `adj' field. However, comparsion of two pointers-to-members requires more than a bitwise comparision. Code equivalent to:
p1.ptr == p2.ptr && (!p1.ptr || (p1.adj == p2.adj))
is required since in the case that p1.ptr and p2.ptr are both zero, there `adj' fields are irrelevant.)
to the ABI document.

[000229 SGI -- Jim] Comparisons (5.10) of pointers to virtual member functions are undefined. So, for pointer-to-function-member comparisons, we only need to worry about non-virtual members and null. Since the representation stores the actual address of the function descriptor, we should be able to just compare the pointers, and ignore the adjustment.

For conversions between base classes, it seems that we need only modify the adjustment, and then only if one is not primary for the other. For conversion to null, it seems that we need only set the pointer to 0, and can ignore the adjustment.

[000302 All] Represent NULL by a 0 pointer, with the adjustment unspecified.

# Issue Class Status Source Opened Closed
A-27 NULL pointers to data members data closed CodeSourcery 000222 000302
Summary: How are NULL pointers to member data represented?
Resolution: A NULL pointer is represented by the value -1.

[000222 CodeSourcery -- Mark] We haven't specified a way to represent a NULL pointer to data member. G++ presently adds one to the offset, allowing zero to serve as the NULL pointer to member.

[000223 CodeSourcery -- Mark] What is the value for the NULL pointer to data member? I guess -1 would do, unless there are cases I can't think of where the pointer to member would legitimately have a negative value. Maybe 0x8000000000000000 is better...

[000229 SGI -- Jim] From the Standard:

So we can conclude that, since we always allocate non-virtual bases before data members, any base object in a derivation chain will have its base address smaller than any of the data members declared in members of the chain. Therefore, the offset represented by a pointer-to-data-member will always be non-negative, even after the permitted conversions above.

So, we could either use -1 for NULL, or use 0 and increment the offset. 0x800...000 is an unnecessary complication.

[000302 All] Represent NULL by the value -1.

# Issue Class Status Source Opened Closed
A-28 RTTI equality testing data closed CodeSourcery 000406 000504
Summary: Can we get back the ability to do a simple test for RTTI equality?
Resolution: Mangle the name NTBS for std::type_info separately, emit it in its own COMDAT, and use it instead of the RTTI struct, at least if the incomplete flags are set in pointer types.

[000406 CodeSourcery -- Nathan] The current RTTI proposal loses the property that all type_info objects can be compared for equality and orderability by address comparison. Instead, type_info::operator== must involve a virtual function call or unconditionaly strcmp. (An alternative of testing the typeid of the polymorphic type_info objects results in infinite recursion!)

Here are two proposals which reinstate the address equality property. The first is rather different to the current scheme, but when I was done documenting it, I realised there was a minor modification to the current scheme, which partially reinstates the address equality. I present both for consideration. Feel free to shot them down ...

Proposal A

  1. The typeid operator produces a std::type_info object for all types. No subclassing of std::type_info is done. The object has comdat linkage, and hence after linking and loading, only one object of that name is active. For typeid(X) it does not matter whether X is incomplete, or direct or indirect pointer to incomplete. The functionality required of typeid is to produce objects which can test for type equality and (implementation defined) type orderability. No information about the internal structure of the type is required.

  2. Dynamic_cast and catch matching require more information. Primarily the heirarchy of a class type, and the target of pointer types. To do this, a separate class heirarchy is used. These objects are also emitted with comdat linkage, and with a different name to the std::type_info objects produced by typeid. (It is not _necessary_ for these to have comdat linkage, but that will reduce overall program size.)

    The base class of these is:

    class abi::__type_info
    {
      std::type_info const *type; // pointer to typeid(foo) object.
      virtual ~__type_info ();
      ... other implementation defined member functions
    };
    
    

    This contains a pointer to the type_info object produced by the typeid operator, for whatever type this is describing. That will be a unique object.

    There are a number of necessary derivations of this type, which can be taken largely unaltered from the current proposal.

    It is necessary to distinguish function types, so that catch matching can distinguish a data pointer object from a function pointer object. Other types (fundamental, enum, array) need not be distinguished, and can be represented by an abi::__type_info object. (Or we could keep the current proposal of having separate derivations for these.)

    class abi::__function_type_info
      : public abi::__type_info
    {
      virtual ~__function_type_info ();
      ... other implementation defined member functions
    };
    
    

    Pointers are as they currently are, other than the base class change. We still need the incomplete target flag.

    class abi::__pointer_type_info
      : public abi::__type_info 
    {
      abi::__type_info const *target;   // target type of the pointer
      unsigned flags;                   // flags, as currently specified
      virtual ~__pointer_type_info ();
      ... other implementation defined member functions
    };
    
    

    Pointers to member could be a sibling class of non member pointers. However, they do share common functionality, and IMO it makes sense to derive from __pointer_type_info.

    class abi::__pointer_to_member_type_info
      : public abi::__pointer_type_info
    {
      abi::__class_type_info const *klass;  // class of the member
      virtual ~__pointer_to_member_type_info ();
      ... other implementation defined member functions
    };
    
    

    The __class_type_info, __si_class_type_info and __vmi_class_type_info are unchanged, other than the change to __class_type_info's base.

    class abi::__class_type_info
      : public abi::__type_info
    {
      ... as currently defined
    }
    
    

The vtable slot -1, (which currently holds a pointer to the std::type_info object for a class), points to the abi::__class_type_info object. To implement typeid(X), where X is polymorphic, involves an additional indirection through the abi::__type_info base to return the `type' member.

dynamic_cast uses the abi::__class_type_info object pointed to in the vtable. throwing and catch matching use the abi::__type_info object for the type being thrown or caught.

As with the current proposal, an incomplete type is represented by an abi::__class_type_info object. Note that its abi::__type_info base will point to the unique std::type_info object for that type, regardless of whether a DSO completes the type. This incomplete type is prevented from preempting the complete type information.

Also direct or indirect pointers to incomplete have their incomplete flag set, and are also prevented from preempting the equivalent pointer to complete object.

During catch matching, comparison of pointers can compare the abi::__pointer_type_info addresses, unless either has the incomplete flag set, in which case the std::type_info objects pointed to must be compared. (The std::type_info objects could be compared even when the incomplete flags are clear.)

There are two or three naming schemes with this proposal:

  1. The naming of the std::type_info object produced by typeid.
  2. The naming of the abi::__type_info object required for dynamic cast and catch matching
  3. Optionally, the naming of the incomplete abi::__class_type_info and direct or indirect pointers to it. If that mangling is specified, we can emit those as comdat objects too, rather than forcing them to be statics.

Advantages of this proposal are:

The cost of this proposal is

Proposal B

The first proposal is essentially using the std::type_info objects as unique objects, via which incomplete types can be compared. We already have such a unique object candidate -- the NTBS name member of std::type_info. Currently we've not said anything about that. If, however, we give that NTBS comdat linkage, a unique name, and prevent it being commonized with other strings, we have a proxy. These features can be obtained by treating it as a `const char []' rather than a string constant. type_info equality and orderability can now use the address of this array, rather than the type_info objects themselves. We can do this in all cases, even though it is only necessary for the pointer to incomplete case, as that avoids a virtual function call. Here is an implementaion of type_info::operator==

bool type_info::operator== (type_info const &other) throw ()
{
  return name == other.name;
}

We need to specify the naming scheme for the NTBS.

The advantages of this are

The costs over proposal A are


[000411 CodeSourcery -- Nathan]

Issue 2

The algorithm for collation order of type_infos, cannot simply compare addresses for non-pointer types, and complete pointer types. Using string collation only works when one of the types is a pointer with the incomplete_mask set. There are two difficulties. Firstly, we might be comparing a non-pointer type_info with a pointer type_info. We need to determine this and DTRT WRT the incomplete flag of the pointer type_info. to do that will require dynamic_cast or typeid'ing the type_infos. Secondly, assume we are just comparing pointer type_info's. We have two pointers to complete, Aptr and Bptr, and a third pointer to incomplete, Cptr.

  1. Aptr.before (Bptr) can just compare addresses.
  2. Bptr.before (Cptr) will compare names.
  3. Cptr.before (Aptr) will compare names.

There is nothing maintaining the consistency of the results of these three tests -- result 1 is uncorrelated with results 2 & 3.

Therefore type_info::before must be implemented as string compare on the type's names. We lose any advantage of commonizing the type_infos.

Issue 3

17.4.4.4 prevents an implementation adding member functions to one of the std classes, except in particular circumstance. About the only leeway given is whether a particular non-virtual function is inline or not. So I presume we're not permitted to add virtual member functions to std::type_info (18.5.1). The rules given in 17.4.4.4 specifying what member functions can be added look like applications of the as-if rule, but there must be something deeper going on, as if that was all, it wouldn't be mentioned. I'm not sure how a conforming program could tell whether additional functions had been added.

The abi requires us to add virtual functions to type_info. For instance the implementation of operator== will require it to deal with pointers to incomplete. G++ needs several for catch matching.

Issue 4

5.2.8 talks about typeid returning something derived from type_info, but the footnote mentioning extended_type_info implies to me that typeid always returns objects of the same type. Again, I'm not sure how a conforming program could tell.

The two proposals above resolve these issues. Proposal A resolves issues 2,3 &4, whilst proposal B resolves issue 2 only, and will leave us (slightly) non-conformant.


[000413 All] The Standard committee members in the group are quite sure that Issues 3 and 4 are not problems. Section 17.4.4.4 does not impose the suggested constraint (see footnote 173), and the intent of 5.2.8 is not to restrict typeid to returning a single class.

Proposal B resolves the remaining issue, and the group is inclined to accept it, while considering whether to go further with A. Jim will (and has) integrated B into the Draft C++ ABI for IA-64.


[000504 All] It was decided to accept the current writeup. See the Draft C++ ABI for IA-64.

# Issue Class Status Source Opened Closed
A-29 RTTI pointer-to-member data closed CodeSourcery 000407 000504
Summary: Derive __pointer_to_member_type_info from __pointer_type_info.
Resolution: Derive __pointer_to_member_type_info and __pointer_type_info from a common base class __pbase_type_info. Add a new flag to __pbase_type_info indicating that the class of a pointer-to-member is incomplete (propagated up a chain of pointers).

[000407 CodeSourcery -- Nathan] __pointer_to_member_type_info is derived from type_info. I strongly recommend it be derived from __pointer_type_info, as it requires much of the same functionality, and has the same meanings of its flags. By subclassing __pointer_type_info, much code could be reused.

Thus point 8 of the rtti classes would become

The abi::__pointer_to_member_type_info type adds one field to abi::__pointer_type_info:


[000411 CodeSourcery -- Nathan] It is permissible in a pointer to member of X, for X to be an incomplete type [8.3.3]/2. This means that we need more that a single incomplete flag. The presence of such a ptr to member, will mean that it and all pointers to it will have their incomplete flag set, but its target might not be an incomplete chain. In implementing G++'s rtti runtime I found the following three flags useful, (this is with __pointer_to_member_type_info derived from __pointer_type_info):

incomplete_mask       = 0x8
incomplete_chain_mask = 0x10
incomplete_klass_mask = 0x20

incomplete_mask is an inclusive or of the other two flags. incomplete_klass_mask is only used by __pointer_to_member_type_info, and __pointer_type_info knows nothing about it (it simply examines the other two).

A __pointer_type_info or __pointer_to_member_type_info sets the incomplete_mask and incomplete_chain_mask, if the target is an incomplete type, or has its incomplete_mask set.

A __pointer_to_member_type_info sets the incomplete_mask and the incomplete_klass_mask, if the class of the member is incomplete.


[000411 Ed.] I've tentatively incorporated both of these into the layout document, except that I just defined a second flag (in __pointer_type_info flags) for direct or indirect incomplete class type (in member pointers). Any pointer type inspections can check for both flags, even though only member pointers can cause one of them to be set up the chain.


[000413 All] Derive __pointer_to_member_type_info and __pointer_type_info from a common base class __pbase_type_info. Add a new flag to __pbase_type_info indicating that the class of a pointer-to-member is incomplete (propagated up a chain of pointers).

(Ed. note) I've added updates to the Draft C++ ABI for IA-64.


[000504 All] It was decided to accept the current writeup. See the Draft C++ ABI for IA-64.

# Issue Class Status Source Opened Closed
A-30 RTTI portability data closed HUB 001012 001109
Summary: What must be specified to produce RTTI portability? Are member layouts specified? Names? Virtual functions?
Resolution: Data members of the ABI-defined type_info derived classes must be allocated as specified, and their names are normative. Virtual functions, beyond the Standard-specified destructor, are implementation-specific, and may not be referenced outside the compiler and system vendors' runtime libraries.

[001012 all -- Jim] The issue here, raised originally by Martin, I will open as A-30. Implementations will generally need additional virtual functions associated with the type_info hierarchy to implement such functionality as dynamic cast. Gcc for instance has functions __is_function_p, __do_catch, __pointer_catch, ...

A program that is built from pieces from different compilers, where the pieces come from different implementations of the hierarchy, will see different structures, at least in the vtables, if we allow this extra material to be arbitrary, creating a problem if such programs actually make use of parts of the hierarchy.

We worked out the following possible solution:

Now an implementation can add an arbitrary set of functions to __cxa_aux_typeinfo, specialized to the derived class like a virtual function, without changing the external interface (to the user) of the hierarchy.

[001103 SGI -- Jim]

[...leaving out much discussion...]

So, after all the above, I suggest the following actions:


[001109 all] The current writeup is adequate. See the resolution in the issue header.

# Issue Class Status Source Opened Closed
A-31 Overlaying tail padding data closed CodeSourcery 001019 001109
Summary: Should we change the decision to overlay tail padding in class layout? For volatile members? In general?
Resolution: The overlaying of tail padding is eliminated, but we will retain the treatment of empty bases.

[001019 CodeSourcery -- Mark] I think I recall that the committee was intentionally trying to use the tail padding of one object to save space. For example, consider:

  struct A { short s; char c; };
  struct B { A a; char d; };
  

(These are PODs, but you can easily make an equivalent non-POD example).

Here, I think the comittee wanted to give `B' size 4, by packing `d' into the tail padding of `A'.

I think this is a mistake. David Gross came up with the following example:

Code generator needs to copy dsize, not sizeof, unless it can prove that the object is in a context where tail padding isn't overlayed. Reason? Tail padding might be overlayed by a volatile field.

Hence, a non-POD that looks like

      struct S { short sh; char ch; };
  

requires ld2/st2/ld1/st1 for a copy instead of ld4/st4 because we might have

      struct T { S s; volatile char d; };
  

Similarly, people using memcpy to copy around POD components of non-PODs will get burned.

This completely breaks user expectation since people routinely expect to be able to stick a function or two into a POD without changing its layout.

I think we should make the following changes:

Note that this still permits the empty base optimization; nvsize will be zero, and sizeof will be 1.

There's an important different between using the tail padding in an empty base and the tail padding in a generic object: you know that you never have to copy an empty base.


[001109 all] Although dealing with tail padding overlaying would be straightforward in a from-scratch compiler, getting the information to all the places in the back end of g++ or the HP compiler that would need it is a huge task (estimated at a widely scattered 1500 lines of code touched in g++). In addition, it is expected that some number of users moving back and forth between C and C++ and trying to match C structs with C++ non-POD classes will have problems, though there are questions about how many.

Therefore, we have decided to eliminate the overlaying of tail padding. Mark will provide alternate proposed wording for the ABI document.


B. Virtual Function Handling Issues

# Issue Class Status Source Opened Closed
B-1 Adjustment of "this" pointer (e.g. thunks) data call closed SGI 990520 991202
Summary: There are several methods for adjusting the this pointer for a member function call, including thunks or offsets located in the vtable. We need to agree on the mechanism used, and on the location of offsets, if any are needed. To maximize performance on IA64, a slightly unusual approach such as using secondary entry points to perform the adjustment may actually prove interesting.
Resolution: See the writeup in the Draft C++ ABI for IA-64.

[990623 HP -- Christophe]

Open Issues Relevant To This Discussion

  1. Keeping all of a class in a single load module. The vtable contains the target address and one copy of the target GP. This implies that it is not in text, and that it is generated by dld.

  2. Detailed layout of the virtual table.

  3. How can we share class offsets?

1. Scope and "State of the Art"

The following proposal applies only to calls to virtual functions when a this pointer adjustment is required from a base class to a derived class. Essentially, this means multiple inheritance, and the existence of two or more virtual table pointers (vptr) in the complete object. The multiple vptrs are required so that the layout of all bases is unchanged in the complete object. There will be one additional vptr for each base class which already required a vptr, but cannot be placed in the whole object so that it shares its vptr with the whole object. Note: when the vptr is shared, the base class is said to be the "primary base class", and there is only one such class.

For the primary base class, no pointer adjustment is needed. For all other bases, a pointer to the whole object is not a pointer to the base class, so whenever a pointer to the base class is needed, adjustment will occur.

In particular, when calling a virtual function, one does not know in advance in which class the function was actually defined. Depending on the actual class of the object pointed to, pointer adjustment may be needed or not, and the pointer adjustment value may vary from class to class. The existing solution is to have the vtable point not to the function itself, but to a "thunk" which does pointer adjustment when needed, and then jumps to the actual function. Another possibility is to have an offset in the vtable, which is used by the called function. However, more often than not, this implies adding zero.

Virtual bases make things slightly more complicated. In that case, the data layout is such that there is only one instance of the virtual base in the whole object. Therefore, the offset from a this pointer to a same virtual base may change along the inheritance tree. This is solved by placing an offset in the virtual table, which is used to adjust the this pointer to the virtual base.

2. Proposal and Rationale

My proposal is to replace thunks with offsets, with two additional tricks:

The thunks are believed to cost more on IA64 than they would on other platforms. The reason is that they are small islands of code spread throughout the code, where you cannot guarantee any cache locality. Since they immediately follow an indirect branch, chances are we will always encounter both a branch misprediction and a I-cache miss in a row.

On the other hand, a virtual function call starts by reading the virtual function address. Reading the offset immediately thereafter should almost never cause a D-cache miss (cache locality should be good). More often than not, no adjustment is needed, or the adjustment will be done at call site correctly. In the worst case scenario, we perform two adjustments, one static at call site, and one dynamic in the callee, but this case should be really infrequent.

3. New Calling Convention

The new calling convention requires that the 'this' pointer on entry points to the class for which the virtual function is just defined. That is, for A::f(), the pointer is an A* when the main entry of the function is reached. If the actual pointer is not an A*, then an adjusting entry point is used, which immediately precedes the function.

In the following, we will assume the following examples:

    struct A { virtual void f(); };
    struct B { virtual void g(); };
    struct C: A, B { }
    struct D : C { virtual void f(); virtual void g(); }
    struct E: Other, C { virtual void f(); virtual void g(); }
    struct F: D, E { virtual void f(); }

    void call_Cf(C *c) { c->f(); }
    void call_Cg(C *c) { c->g(); }
    void call_Df(D* d) { d->f(); }
    void call_Dg(D* d) { d->g(); }
    void call_Ef(E* e) { e->f(); }
    void call_Eg(E* e) { e->g(); }
    void call_Ff(F *ff) { ff->f(); }
    void call_Fg(F *ff) { ff->g(); }	// Invalid: ambiguous

a) Call site:
The caller performs adjustment to match the class of the last overrider of the given function.

  • call_Cf will assume that the pointer needs to be cast to an A*, since C::f is actually A::f. Since A is the primary base class, no adjustment is done at call site.

  • call_Cg is similar, but assumes that the actual type is a B*, and performs the adjustment, since B is not the primary base class.

  • call_Df and call_Dg will assume that the pointer needs to be cast to a D*, which is where D::f is defined. No adjustment is performed at call site.

b) Callee

  • A::f and B::g are defined in classes where there is a single vptr. They don't define a secondary entry point. Because of call-site conventions, they expect to always be called with the correct type.

  • D::f is defined in a class where there is more than one vptr, so it needs a secondary entry point and an entry 'convert_to_D' in the vtable. That's because it can be potentially called with either an A* or a B*. There are two vtables, one for A in D, one for B in D. The D::f entry in A in D points to the non-adjusting entry point, since A shares its vptr.

  • D::g requires a secondary entry point, that will read the same offset 'convert_to_D' from the vtable.

  • E also will require a 'convert_to_E' entry in the vtable, but this time, the vtable for A in C will have to point to an adjusting entry point, since A no longer shares the vptr with E (assuming Other has a vptr). This vtable is also the vtable of C in E.

c) Offsets in the vtable
Offsets have to be placed in the vtable at a position which does not conflict with any offset in the inheritance tree.

convert_to_D and convert_to_E are likely to be at the same offset in the vtable. This is not a problem, even if D and E are used in the same class, such as F, because this is the same offset in different vtables.

  • call_Fg is invalid, because it is ambiguous.

  • A notation such as ((E*) ff)->g() can be used to disambiguate, but in that case, we don't use the same vtable (either the E in F or D in F vtable). The E in F vtable uses that offset as 'convert_to_E', whereas the D in F vtable uses that offset as 'convert_to_D'.

  • Similarly, call_Cf called with an F object will actually be called with the E in F or D in F, which disambiguates which C is actually used. The actual C* passed will have been adjusted by the caller unambiguously, or the call will be invalid.

  • For functions overriden in F, an entry 'convert_to_F' is created anyway. This entry will not overlap with either convert_to_E or convert_to_D.

The fact that an offset is reserved does not mean that it is actually used. A vtable need to contain the offset only if it refers to a function that will use it. An offset of 0 is not needed, since the function pointer will point to the non-adjusting entry point in that case.

4. Cases where adjustment is performed

In other words, adjustment is made only when necessary, and at a place where it is better scheduled than with thunks. The only bad case is double adjustment for call_Cg called with an E*. This case can probably be considered rare enough, compared to calls such as call_Cg called with a C*, where we now actually do the adjustment at the call-site.

5. Comparing the code trails

Currently, the sequence for a virtual function call in a shared library will look as follows. I'm assuming +DD64, there would be some additional addp4 in +DD32. The trail below is the dynamic execution sequence. In bold and between #if/#endif, the affected code.

        // Compute the address of the vptr in the object,
	// from the this pointer
        // Optional, since vptroffset is often 0.
	// This also adjusts to the class of the final overrider
        addi            Rthis=vptroffset_of_final_overrider,Rthis
        ;;
        // Load the vptr in a register
        ld8             Rvptr=[Rthis]
        ;;
        // Add the offset to get to the function descriptor pointer
	// in the vtable.  Never zero, this instruction is always generated
        addi            Rfndescr=fndescroffset,Rvptr
        ;;
        // (Assuming inlined stub) Load the function address and new GP
        ld8             Rfnaddr=[Rfndescr],8
        ;;
        // Load the new GP
        ld8             GP=[Rfndescr]
        mov             BRn=Rfnaddr
        ;;
        // Perform the actual branch to the target

        // ...
        // ... Branch misprediction almost always, followed by
        // ... I-Cache miss almost always if jumping to a thunk
        br.call B0=BRn

#if OLD_ADJUST
thunk_A::f_from_a_B:
        // If the 'adjustment_from_B_to_A is the 'adjustment_to_A' above,
        // then in the new case, the vtable directly points to A::f
        addi            Rthis,adjustment_from_B_to_A

        // In most cases, we can probably generate a PC-relative branch here
        // It is unclear whether we would correctly predict that branch
        // (since it is assumed that we arrive here immediately following
        // a misprediction at call site)
        br              A::f
#endif // OLD_ADJUST

// This occurs less often than OLD_ADJUST
// (it does not happen when call-site adjustment is correct)
#if NEW_ADJUST
adjusting_entry_A::f
        // Can't be executed in less than 3 cycles?
        addi            Rvptr=class_adjustment_offset,Rvptr
        ;;
        // This loads data which is close to the fn descriptor,
        // so it's likely to be in the D-cache
        ld8             Rvptr=[Rvptr]
        ;;
        add             Rthis=Rthis,Rvptr
#endif

A::f:
        alloc   ...

[990812 All] Discussion of B-6 raises questions of impact on the above approach. Christophe will look at the issues.

[990826 Cygnus -- Jason] [An alternative suggestion from Jason via email.]

Rather than per-function offsets, we have per-target type offsets. These offsets (if any) are stored at a negative index from the vptr. When a derived class D overrides a virtual function F from a base class B, if no previously allocated offset slot can be reused, we add one to the beginning of the vtable(s) of the closest base(s) which are non-virtually derived from B. In the case of non-virtual inheritance, that would be D's vtable; in simple virtual inheritance, it would be B's. The vtables are written out in one large block, laid out like an object of the class, so if B is a non-virtual base of D, we can find the D vtable from the B vptr.

D::f then recieves a B*, loads the offset from the vtable, and makes the adjustment to get a D*. The plan is to also have a non-adjusting vtable entry in D's vtable, so we don't have to do two adjustments to call D::f with a D*; the implementation of this is up to the compiler. I expect that for g++, we will do the adjustment in a thunk which just falls into the main function.

The performance problems with classic thunks occur when the thunk is not close enough to the function it jumps to for a pc-relative branch. This cannot be avoided in certain cases of virtual inheritance, where a derived class must whip up a thunk for a new adjustment to a method it doesn't override.

In this case, we will only ever have one thunk per function, so we don't even have to jump. Except in the case of covariant returns, that is, where we will have one per return adjustment. But we know all necessary adjustments at the point of definition of the function, so they can all be within pc-relative branch range.

[Extensive discussion followed by email -- this suggestion is not completely correct, but may be the basis of a workable solution.]

[990831 Cygnus -- Ian] A couple of observations ...

On the state of the art:

The Microsoft approach is worth mentioning. (I haven't seen it discussed -- though perhaps that is because of the patent situation.)

It allows zero-adjusting (i.e. non-thunking) calls for (almost) every virtual function call in a non-virtual, multiple inheritance hierarchy.

For those that are unfamiliar, the idea is that all calls go via the base class vft and overriding functions expect a pointer to the base class type. (That is, if D::f overrides B::f, it expects the first parameter to be of type B*, not D*.) The callee does the necessary static adjustment to get to the derived class 'this' pointer as needed.

It avoids requiring a thunk, and it's often the case that the cost is zero in the callee because the this-adjustment can be folded into other offset computations.

On the balance, it could well win over all the other approaches being discussed here. [Though, it may lose in some specific cases vs. Christophe's approach where one would create additional extra entries in the derived class vft.]

On when to make extra virtual function table entries for functions:

One of Cristophe's suggestions is sort-of separate from the rest of the discussion: making extra entries in the derived class' vft for some overridden virtual functions. It has the benefit of giving you a faster calls if you happen to be in (or near) the derived class -- at the expense of space in the vft.

Of course, you can always make the call through the introducing base class, so these extra entries are a pure space/time performance trade off (w/ some unpredictable D-cache effects) and the cost/benefit analysis will depend a little on what the rest of the strategy looks like.

The same idea is potentially applicable, no matter what strategy you actually use for vft layout, and different criteria for deciding what extra entries to make are possible. For example, creating an extra entry when overriding a function introduced in a virtual base has the added benefit of avoiding a cast to a virtual base at the call site.

[990909 All] We are getting closer -- understanding of the alternatives is improving, and Christophe may agree with the Jason/Brian proposal after more thought. To make sure we really understand what we're agreeing to, Jason and Christophe will write up more precise proposal(s).


[991111 jason]

Final virtual calling convention:

We have decided that for virtual functions not inherited from a virtual base, regular thunks will work fine, since we can emit them immediately before the function to avoid the indirect branch penalty; we will use offsets in the vtable for functions that come from a virtual base, because it is impossible to predict what the offset between the current class and its virtual base will be in classes derived from the current class.

The calling convention is as follows:


[991202 all] Adopt Jason's writeup.

# Issue Class Status Source Opened Closed
B-2 Covariant return types call closed SGI 990520 990722
Summary: There are several methods for adjusting the 'this' pointer of the returned value for member functions with covariant return types. We need to decide how this is done. Return thunks might be especially costly on IA64, so a solution based on returning multiple pointers may prove more interesting.
Resolution: Provide a separate Vtable entry for each return type.

[990610 Matt] One possibility is to have two Vtable entries, which might point to different functions, different entrypoints, or a real entrypoint and a thunk. Another is to return two result pointers (base/derived), and have the caller select the right one.

[990715 All] Daveed presented his multiple-return-value scheme, including an example that involved virtual base classes, return values that are pointers to nonpolymorphic classes, and other equally horrible things.

Consensus: we need to get the horrible cases correct, but speed only matters in the simple case. The simple case: class B has a virtual function f returning a B1* and class D has a virtual function f returning a D1*, where all four classes are polymorphic, B is a primary base of D, and B1 is a primary base of D1. (The really important case is where B1 is B and D1 is D, but that simplification doesn't make any difference.)

Jason: Would the usual multiple-entry-point scheme work just as well? That is, would it be just as fast as Daveed's scheme in the simple case, and still preserve enough information for the more complicated cases? It appears so, but we don't have a proof. Jason will try to provide one.

[990716 Cygnus -- Jason] Proof? You always know what types a given override must be able to return, and you know how to convert from the return type to those base types. You know from the entry point which type is desired. Seems pretty straightforward to me.

[990716 Cygnus -- Jason] The alternative I was talking about yesterday goes something like this:

When we have a non-trivial covariant return situation, we create a new entry in the vtable for the new return type. The caller chooses which vtable entry to use based on the type they want.

This could be implemented several ways, at the discretion of the vendor:

  1. Multiple entry points to one function, with an internal flag indicating which type to return.

  2. Thunks which intercept the function's return and modify the return value. Note that unlike the case of calling virtual functions, for covariant returns we always know which adjustments will be needed, so we don't have to pay for a long branch. We do, however, lose the 1-1 correspondence between calls and returns, which apparently affects performance on the Pentium Pro.

  3. Function duplication.

The advantage of this approach to the complex case is that we don't have to do a dynamic_cast when faced with multiple levels of virtual derivation. It is also strictly simpler; Daveed's model already requires something like this in cases of multiple inheritance.

Of course, we can always mix and match; we could choose to only do this in cases of virtual inheritance, or use Daveed's proposal and do this only in cases of repeated virtual inheritance. In that case, the multiple returns would just be an optimization for the single virtual inheritance case.

Since we don't seem to care about the performance of anything but single nonvirtual inheritance, it seems simpler not to bother with multiple returns.

The remaining question is how to handle the case of nontrivial nonvirtual inheritance: do we use multiple slots or have the caller do the adjustment? My inclination is to have the caller adjust.

WRT patents, the idea of having the function return the base-most class and having the caller adjust is parallel to the patented Microsoft scheme whereby they pass the base-most class as the 'this' argument to virtual functions, but the word 'return' does not appear anywhere in the patent, so it seems safe.

[990722 All] The group was generally agreed that the simplicity of multiple entries in the vtable outweighed any space/performance advantage of more complex schemes (e.g. the method Daveed described on 15 July). Discussion focussed on whether it is worthwhile to eliminate some of the entries in cases where they are unnecessary because the caller knows the required conversion, namely when the return type has a unique non-virtual subobject of the original return type.

Agreement was reached to avoid the complication of eliminating some of the Vtable entries. Thus, the Vtable will have one entry for each accessible return type of a covariant virtual function. These may be implemented in a variety of ways, e.g. duplicated functions, separate entrypoints, or stubs, and the ABI need not specify the choice. The location of the Vtable entries is part of the separate Vtable layout issue B-6.

# Issue Class Status Source Opened Closed
B-3 Allowed caching of vtable contents call closed HP 990603 990805
Summary: The contents of the vtable can sometimes be modified, but the concensus is that it is nonetheless always allowed to "cache" elements, i.e. to retain them in registers and reuse them, whenever it is really useful. However, this may sometimes break "beyond the standard" code, such as code loading a shared library that replaces a virtual function. Can we all agree when caching is allowed?
Resolution : Caching is allowed.

[990604 HP -- Christophe] Mike (Ball) gave me what I believe is an excellent definition of when caching is allowed. I'd like him to present it.

[990805 All] Christophe explained that the rule is simply that, within a call to a member function of the class, the class Vtable may not be modified. Between such calls, no assumption may be made. With this observation, the issue is closed.

[990812 All] The rule is even simpler. Once a program changes the type of a pointer's target, the pointer is invalidated, and its value may not be reused. Therefore, a code sequence which repeatedly refers to the same pointer value is invalid if the pointee's vtable has been changed.

# Issue Class Status Source Opened Closed
B-4 Function descriptors in vtable data closed HP 990603 990805
Summary: For a runtime architecture where the caller is expected to load the GP of the callee (if it is in, or may be in, a different DSO), e.g. HP/UX, what should vtable entries contain? One possibility is to put a function address/GP pair in the vtable. Another is to include only the address of a thunk which loads the GP before doing the actual call.
Resolution : The Vtable will contain a function address/GP pair.

[990624 All] Note that putting GP in the Vtable prevents putting it in shared memory. See B-7.

[990805 All] I