C++ ABI Closed Issues
Revised 17 November 2000
| call | Function call interface, i.e. call linkage |
| data | Data layout |
| lib | Runtime library support |
| lif | Library interface, i.e. API |
| g | Potential gABI impact |
| ps | Potential psABI impact |
| source | Source code conventions (i.e. API, not ABI) |
| tools | May affect how program construction tools interact |
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-1 | Vptr location | data | closed | SGI | 990520 | 990624 |
| Summary: Where is the Vptr stored in an object (first or last are the usual answers). | ||||||
[990610 All] Given the absence of addressing modes with displacements on IA-64, the consensus is to answer this question with "first."
[990617 All] Given a Vptr and only non-polymorphic bases, which (Vptr or base) goes at offset 0?
Tentative decision: Vptr always goes at beginning.
[990624 All] Accepted tentative decision. Rename, close this issue, and open separate issue (B-6) for Vtable layout.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-2 | Virtual base classes | data | closed | SGI | 990520 | 990624 |
| Summary: Where are the virtual base subobjects placed in the class layout? How are data member accesses to them handled? | ||||||
[990610 Matt] With regard to how data member accesses are handled, the choices are to store either a pointer or an offset in the Vtable. The concensus seems to be to prefer an offset.
[990617 All] Any number of empty virtual base subobjects (rare) will be placed at offset zero. If there are no non-virtual polymorphic bases, the first virtual base subobject with a Vpointer will be placed at offset zero. Finally, all other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.
[990624 All] Define an empty object as one with no non-static, non-empty data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes. Define a nearly empty object as one which contains only a Vptr. The above resolution is accepted, restated as follows:
Any number of empty virtual base subobjects (rare, because they cannot have virtual functions or bases themselves) will be placed at offset zero, subject to the conflict rules in A-3 (i.e. this cannot result in two objects of the same type at the same address). If there are no non-virtual polymorphic base subobjects, the first nearly empty virtual base subobject will be placed at offset zero. Any virtual base subobjects not thus placed at offset zero will be allocated at the end of the class, in left-to-right, depth-first declaration order.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-3 | Multiple inheritance | data | closed | SGI | 990520 | 990701 |
| Summary: Define the class layout in the presence of multiple base classes. | ||||||
[990617 All] At offset zero is the Vptr whenever there is one, as well as the primary base class if any (see A-7). Also at offset zero is any number of empty base classes, as long as that does not place multiple subobjects of the same type at the same offset. If there are multiple empty base classes such that placing two of them at offset zero would violate this constraint, the first is placed there. (First means in declaration order.)
All other non-virtual base classes are laid out in declaration order at the beginning of the class. All other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.
The above ignores issues of padding for alignment, and possible reordering of class members to fit in padding areas. See issue A-9.
[990624 All] There remains an issue concerning the selection of the primary base class (see A-7), but we are otherwise in agreement. We will attempt to close this on 1 July, modulo A-7.
[990701 All] This issue is closed. A full description of the class layout can be found in issue A-9. (At this time, A-7 remains to be closed, waiting for the Taligent rationale.)
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-4 | Empty base classes | data | closed | SGI | 990520 | 990624 |
| Summary: Where are empty base classes allocated? (An empty base class is one with no non-static data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes.) | ||||||
[990624 All] Closed as a duplicate of A-3.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-5 | Empty parameters | data | closed | SGI | 990520 | 001117 |
| Summary: When passing a parameter with an empty class type by value, what is the convention? | ||||||
| Resolution : Except for cases of non-trivial copy constructors (see C-7), and parameters in the variable part of varargs lists, A single parameter slot will be allocated to empty parameters, as though they were a struct containing a single character. | ||||||
[990623 SGI] We propose that no parameter slot be allocated to such parameters, i.e. that no register be used, and that no space in the parameter memory sequence be used. This implies that the callee must allocate storage at a unique address if the address is taken (which we expect to be rare).
[990624 All] In addition to the address-taken case, care is required if the object has a non-trivial copy constructor. HP observes that in (some?) such cases, they perform the construction at the call site and pass the object by reference.
[990625 SGI -- Jim] I understand that the Standard explicitly allows elimination of even non-trivial copy construction in some cases. Is this one of them? Where should I look? Also, of course, varargs processing for elided empty parameters would need to be careful.
I have opened a new issue (C-7) for passing copy-constructed parameters by reference. Since doing so would turn an empty value parameter into a non-empty reference parameter, this issue can ignore such cases.
[990701 All] An empty parameter will not occupy a slot in the parameter sequence unless:
Daveed and Matt will pursue the question of when copy constructors may be ignored for parameters with the Core committee, and if they identify cases where the constructors may clearly be omitted, those (empty) parameters will also be elided.
[001109 CodeSourcery -- Mark] Both g++ and the HP compiler have great difficulty dealing with this, and prefer to reserve the parameter slot even for empty parameters. At the meeting, we tentatively decided to reverse our decision and allocate an integer parameter slot even for empty parameters. We will place no constraints on the data in the parameter slot, except that on IA-64, it must be not be NaT data.
[001117 All -- Jim] There having been no objection to the proposed resolution, it is adopted. Results will be treated the same way.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-6 | RTTI .o representation | data call ps | closed | SGI | 990520 | 991028 |
Summary:
Define the data structure to be used for RTTI, that is:
| ||||||
| Resolution: Defined in the Draft C++ ABI for IA-64. | ||||||
[990701 All] Daveed will put together a proposal by the 15th (action #13); the group will discuss it on the 22nd.
[990805 All] Daveed should have his proposal together for discussion. Michael Lam will look into the Sun dynamic cast algorithm.
It was noted that appropriate name selection along with the normal DSO global name resolution should be sufficient to produce a unique address for each class' RTTI struct, which address would then be a suitable identifier for comparisons.
[990812 Sun -- Michael] Sun has provided a description, in a separate page, describing their implementation. They are filing for a patent on the algorithms described.
[990819 EDG -- Daveed] (Proposal replaced by later version on 6 October.)
[990826 All] Discussion centered on whether the representation should include all base classes or just the direct ones, and in the former case how hashing might be handled. It was agreed that the __qualifier_type_info variant is not needed, and it is now striken in the above proposal. Also, a pointer-to-member variant is needed. Christophe will provide a description of the HP hashing approach, and Daveed will update the specification.
[991006 EDG -- Daveed]
The C++ programming language definition implies that information about types be available at run time for three distinct purposes:
The following conclusions were arrived at by the attending members of the C++ IA-64 ABI group:
The full proposal has been incorporated in the Draft C++ ABI for IA-64.
[991014 all]
ACTION ITEMS: Daveed---make these changes. Jim---incorporate these changes into the open issues list. We are almost ready to close this issue; we intend to close it at the 28 October meeting, after we've all had a change to go over the modified writeup.
[991028 all]
[990617 All] It will be shared with the first polymorphic non-virtual base class, or if none, with the first nearly empty polymorphic virtual base class. (See A-2 for the definition of nearly empty.)
[990624 All] HP noted that Taligent chooses a base class with virtual bases before one without as the primary base class), probably to avoid additional "this" pointer adjustments. SGI observed that such a rule would prevent users from controlling the choice by their ordering of the base classes in the declaration. The bias of the group remains the above resolution, but HP will attempt to find the Taligent rationale before this is decided.
[990729 All] Close with the agree resolution. If a convincing Taligent rationale is found, we can reconsider.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-8 | (Virtual) base class alignment | data | closed | HP | 990603 | 990624 |
| Summary: A (virtual) base class may have a larger alignment constraint than a derived class. Do we agree to extend the alignment constraint to the derived class? (An alternative for virtual bases: allow the virtual base to move in the complete object.) | ||||||
[990623 SGI] We propose that the alignment of a class be the maximum alignment of its virtual and non-virtual base classes, non-static data members, and Vptr if any.
[990624 All] Above proposal accepted. (SGI observation: the size of the class is rounded up to a multiple of this alignment, per the underlying psABI rules.)
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-9 | Sorting fields as allowed by [class.mem]/12 | data | closed | HP | 990603 | 990624 |
| Summary: The standard constrains ordering of class members in memory only if they are not separated by an access clause. Do we use an access clause as an opportunity to fill the gaps left by padding? | ||||||
| Resolution: See separate writeup of Draft C++ ABI for IA-64. | ||||||
[990610 all] Some participants want to avoid attempts to reorder members differently than the underlying C struct ABI rules. Others think there may be benefit in reordering later access sections to fill holes in earlier ones, or even in base classes.
[990617 all] There are several potential reordering questions, more or less independent:
There is no apparent support for (1), since no simple heuristic has been identified with obvious benefits. There is interest in (2), based on a simple heuristic which might sometimes help and will never hurt. However, it is not clear that it will help much, and Sun objects on grounds that they prefer to match C struct layout. Unless someone is interested enough to implement and run experiments, this will be hard to agree upon. G++ has implemented (3) as an option, based on specific user complaints. It clearly helps HP's example of a base class containing a word and flag, with a derived class adding more flags. Idea (4) has more problems, including some non-intuitive (to users) layouts, and possibly complicating the selection of bitwise copy in the compiler.
[990624 all] We will not do (1), (2), or (4). We will do (3). Specifically, allocation will be in modified declaration order as follows:
[990722 all] The precise placement of empty bases when they don't fit at offset zero remained imprecise in the original description. Accordingly, a precise layout algorithm is described in a separate writeup of Data Layout.
[990729 all] The layout writeup was accepted, with the first choice for empty base placement. That is, if placement at offset zero doesn't work, it will be placed like a normal base/member. The concensus was that this won't happen often, and such bases will often overlap with the preceding tail padding or following components anyway. Jim will modify the writeup accordingly.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-10 | Class parameters in registers | call | closed | HP | 990603 | 990710 |
Summary:
The C ABI specifies that structs are passed in registers.
Does this apply to small non-POD C++ objects passed by value?
What about the copy constructor and this pointer in that case?
| ||||||
[990701 all] A separate issue (C-7) deals with cases where a non-trivial copy constructor is required; we ignore those cases here. Our conclusion is that, without a non-trivial copy constructor, we need not be concerned about the class object moving in the process of being passed, and there is no need to use a mechanism different from the base ABI C struct mechanism. At the same time, if we do use the underlying C struct mechanism, the user has complete control of the passing technique, by choosing whether to pass by value or reference/pointer.
Therefore, except in cases identified by issue C-7 for different treatment, class parameters will be passed using the underlying C struct protocol.
[990729 All] Jason described the g++ implementation, which is a three-member struct:
A concern about covariant returns was raised. It was observed that, given our decision to use distinct Vtable entries for distinct return types, no further concern is required here. Others will describe their representations. IBM has an alternative, but it is believed to be patented by Microsoft.
[990805 All] It is agreed that a two-element struct will be used for a pointer to a member function, with elements as follows:
ptr:
adj:
Although we agreed to close this, SGI suggests a minor modification. Since the Vtable offset of a virtual function will always be even, we suggest that it not be doubled before adding 1. This is because shifts are more restricted on many processors than other integer ALU operations (shifters are large structures), so an XOR or NAND will often be cheaper than a right shift.
[990812 All] Close this issue with the suggested modification.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-12 | Merging secondary vtables | data | closed | Sun | 990610 | 990805 |
| Summary: Sun merges the secondary Vtables for a class (i.e. those for non-primary base classes) with the primary Vtable by appending them. This allows their reference via the primary Vtable entry symbol, minimizing the number of external symbols required in linking, in the GOT, etc. | ||||||
| Resolution: Concatenate the Vtables associated with a class in the same order that the corresponding base subobjects are allocated in the object. | ||||||
[990701 Michael Lam] Michael will check what the Sun ABI treatment is and report back.
[990729 All] A separate issue raised in conjunction with A-7 is whether to include Vfunc pointers in the primary Vtable for functions defined only in the base classes and not overridden. If the primary and secondary Vtables are concatenated, this is no longer an issue, since all can be referenced from the primary Vptr.
[990805 All] All of the Vtables associated with a class will be concatenated, and a single external symbol used (to be identified as part of the mangling issue F-1). The order of the tables will be the same as the order of base class subobjects in an object of the class, i.e. first the primary Vtable, then the non-virtual base classes in declaration order, and finally the virtual base classes in depth-first declaration order.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-13 | Parameter struct field promotion | call | closed | SGI | 990603 | 990701 |
| Summary: It is possible to pass small classes either as memory images, as is specified by the base ABI for C structs, or as a sequence of parameters, one for each member. Which should be done, and if the latter, what are the rules for identifying "small" classes? | ||||||
| Resolution: No special treatment will be specified by the ABI. | ||||||
[990701 all] Define no special treatment for this case in the ABI. A translator with control over both caller and callee may choose to optimize.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-14 | Pointers to data members | data | closed | SGI | 990729 | 990805 |
| Summary: How should pointers to data members be represented? | ||||||
| Resolution: Represented as one plus the offset from the base address. | ||||||
[990729 SGI]
We suggest an offset from the base address of the class,
represented as a ptrdiff_t.
[990805 All]
Such pointers are represented as one plus the offset from the base
address of the class, as a ptrdiff_t.
NULL pointers are zero.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-15 | Empty bit-fields | data | closed | CodeSourcery | 991214 | 000106 |
| Summary: How are zero-length bit-fields handled? | ||||||
| Resolution: Zero-length bit-fields do not prevent a class from being considered empty or nearly empty. | ||||||
[991214 CodeSourcery -- Mark]
Question: Does the presence of a zero-width bit-field prevent a class from being empty?
Suggested Resolution: No. Amend the definition of an "empty class" to read:
Amend the definition of a "nearly empty class" to read:
[000106 All] Accept the CodeSourcery proposal.
[000106 All] Accept the proposal.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-17 | Primary indirect virtual base allocation | data | closed | SGI | 991228 | 000113 |
| Summary: When a nearly empty virtual base class A is allocated as the primary base class of class B, and then B is allocated as a base class of C, should A (i.e. its vptr) be separately allocated in C, or should its first occurrence in a previously allocated base B be used as its allocation in C? | ||||||
| Resolution: Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order. | ||||||
[991228 SGI -- Jim] Specific wording for a proposed change is in the Draft C++ ABI for IA-64.
[000103 CodeSourcery -- Mark]
I think the current proposal for allocating virtual bases is still a
little suboptimal. In particular, given:
we'll give `C' a larger size than for:
struct A { void f(); };
struct B : virtual public A { };
struct C : virtual public A, virtual public B { };
where we'll reuse the `A' part of `B' rather than reallocating it.
struct C : virtual public B, virtual public A { };
I know that ordering can already affect size (principally because of alignment issues) but I think that in this case we might as well not punish programmers for choosing the "wrong" ordering.
I think we should change the green A-17 proposed resolution to indicate that if one of the virtual bases is a (direct or indirect) primary base of one of the other virtual bases then we need not allocate a fresh copy.
FWIW, it turns out to actually be easier in GCC to code the more generous version.
The algorithm to do this is linear in the size of the hierarchy: just iterate through the inheritance DAG marking all primary bases. Any virtual base classes that remain unmarked need to be allocated in step III. A slight formalization of this sentence might be a good way to express which bases to choose for III.
[000113 All]
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-18 | Virtual base alignment | data | closed | SGI | 991228 | 000113 |
| Summary: Should virtual bases have a different effect on class alignment than other components? | ||||||
| Resolution: Yes. When allocating the non-virtual part of a base class, use its non-virtual allignment, i.e. ignoring its virtual bases' contributions. | ||||||
[991228 SGI -- Jim] Since the allocation of virtual bases is "floating" relative to the classes in which they occur, it is possible for them to have independent alignment constraints. Specifically, when allocating a base class with a virtual base, we could treat its alignment as that obtained by ignoring the virtual base, and later allocate the virtual base with greater alignment.
Since the class with a virtual base already has a vptr, this only matters if the virtual base contains components more strictly aligned than a pointer. Thus, the benefit of doing so is probably not large. To get some idea of the effect on the layout definition, look at dsize and nvsize, and assume a similar pair of alignment values.
[000106 All] No strong opinions were expressed on this issue. We will decide it at the next meeting after people have a chance to think it over. The bias will be to keep the current simpler definition.
[000113 All] It turns out that both Compaq and someone else (Cygnus?) already do this, find it straightforward, and prefer to keep it. Therefore, accept the suggestion that when allocating the non-virtual part of a base class, we use its non-virtual allignment, i.e. ignoring its virtual bases' contributions.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-19 | Primary indirect virtual base choice | data | closed | All | 000106 | 000120 |
| Summary: In allocating class C, when the first nearly empty virtual base class A is allocated as the primary base class of a later nearly empty virtual base class B, should A or B become the primary base class of C? | ||||||
| Resolution: Do not use a virtual base as primary if it is already a primary base of some other direct or indirect base, unless such are the only candidates. In either case, use the first candidate in depth-first, left-to-right order in the inheritance graph. | ||||||
[000106 All] This issue was initially confused in the discussion with A-17, but is independent. Recall that non-virtual bases have priority over virtual bases for selection as the primary base. Assuming that no non-virtual base is suitable, this issue involves which virtual base should be selected. Our original decision was to use the first in left-to-right order.
The proposal here is that, if this initial candidate A is itself already a primary base class of a later virtual base B, then B will be used instead, unless it is already a primary base class of a later virtual base, and so on. See proposed wording in the ABI layout document.
Noone can identify a case in which this approach is worse than the original definition.
[000113 All] The proposed resolution on the table is to use the following priority to choose the primary base class:
[000113 All] Modify the above to use any virtual base in the inheritance graph, first one that is not already primary to some base if possible, or then any candidate, chosen as the first in a depth-first, left-to-right inheritance graph walk.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-20 | Operator new array cookies | data | closed | All | 000113 | 000120 |
| Summary: When operator new is used to create a new dynamic-length array, a cookie must be stored to remember the allocated length so that it can be deallocated correctly. | ||||||
| Resolution: In principle, place cookie immediately before array, aligned naturally. Use no cookie for array element types without destructors. See the Draft C++ ABI for IA-64. | ||||||
[000113 All] The proposed resolution is as follows:
sizeof(size_t).
align be the maximum alignment of
size_t and an element of the array to be allocated.
align bytes.
align bytes.
align bytes
from the space allocated for the array.
sizeof(size_t) bytes
immediately preceding the array data.
sizeof(size_t)
is smaller than the array element alignment,
and if present will precede the cookie.
[000120 All] Accept the above.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-21 | Placement new array cookies | data | closed | All | 000113 | 000217 |
| Summary: Same issue as A-20, except that for placement new, the user supplies already-allocated space. Therefore, there is a conflict between wanting to make delete() work on arrays created in this way, and wanting to avoid surprising users who haven't allocated enough space for the cookie. Also, are cookies allocated if there is no destructor? | ||||||
Resolution:
Use no cookie for element types with no destructors,
nor for ::operator new(size_t, void*).
Otherwise, use a cookie as in issue A-20.
See the Draft C++ ABI for IA-64.
| ||||||
[000119 SGI -- Matt]
What the standard says (3.7.3.1, 5.3.4, and 18.4.1.3)
Array placement new has the form "new(ARGS) T[n]". The "(ARGS)" part is optional. If it's present then this is a placement new-expression, and we use a version of operator new[] with two or more arguments, otherwise it's an ordinary new-expression, and we use a version of operator new[] with one argument. For the purposes of this proposal, the distinction isn't all that important.
After finding the appropriate operation new, a new-expression obtains storage with
void* p = operator new[](n1, ARGS),
It is required (3.7.3.1/2) that the return value of any operator new[], whether it's built-in or provided by the user, must be suitably aligned for objects of any type.
If T is "char" or "unsigned char" the standard requires that delta is a nonnegative multiple of the most stringent alignment constraint for objects of size less than or equal to n (5.3.4/10). Otherwise the only restriction is that delta is nonnegative.
Some implementations store the number of elements in the array at a negative offset from p1. The standard neither requires nor forbids it.
There's a predefined placement version of array operator new,
::operator new[](size_t n1, void* p),
IA-64 Specifics
On IA-64 long double is 80 bits. long double has 128-bit alignment, as do classes and unions containing long double, so sizeof(long double) is 16. All other types have at most 64-bit alignment.
What the abi needs to specify
Proposal A
No version of operator new[] is a special case. For any array new-expression we store the number of elements in the array, as a size_t, at an offset of -sizeof(size_t) from the pointer returned by the new-expression. For any type T other than char, unsigned char, long double, or a type containing a long double, n1 = n * sizeof(T) + sizeof(size_t). For those three types, since we need to preserve long double alignment, n1 = n * sizeof(T) + sizeof(long double).
Pseudocode for new(ARGS) T[n] under this proposal:
if T = char or unsigned char, or if it has long double alignment,
padding = sizeof(long double)
else
padding = sizeof(size_t)
p = operator new[](n * sizeof(T) + padding, ARGS)
p1 = (T*) (p + padding)
((unsigned long*) p1 - 1) = n
for i = [0, n)
create a T, using the default constructor, at p1[i]
return p1
Proposal B
::operator new[](size_t, void*) is a special case. For that version of operator new[] only, n1 = n * sizeof(T). We do not store the number of elements in such an array anywhere.
Pseudocode for new(ARGS) T[n] under this proposal:
If the expression is new(p) T[n], and if overload resolution
determines we're using ::operator new[](size_t, void*), then
p1 = (T*) p
for i = [0, n)
create a T, using the default constructor, at p1[i]
return p1
For all other cases, same as proposal A.
Proposal A is simpler, but proposal B probably conforms more closely to user expectations.
[000210 All -- Matt]::operator new(size_t, void*)
is a special case with no cookie,
is preferable to Proposal A,
where all versions of array new get cookies.
We also agreed to the variation where we don't reserve space for a cookie if the type has no destructor. We're calling it Proposal C. We need a writeup, but we should be able to close this issue next week.
[000302 CodeSourcery -- Mark]
In particular, there are situations in which we do not allocate cookies, even when allocating arrays of class type. But, the standard guarantees that:
When a delete-expression is executed, the selected deallocation function shall be called with the address of the block of storage to be reclaimed as its first argument and (if the two-parameter style is used) the size of the block as its second argument.)
That paragraph doesn't require that the class type have a non-trivial destructor.
I think that means the first bullet:
(Note: if the usual array deallocation functions takes two arguments, then its second argument is of type size_t. The standard guarantees that this function will be passed the number of bytes allocated with the previous array new expression. See [class.free] for details.)
[000302 All]
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-22 | RTTI for reference types | data | closed | CodeSourcery | 000119 | 000203 |
| Summary: __reference_type_info does not appear to be necessary. | ||||||
| Resolution: Remove it. | ||||||
[000119 CodeSourcery -- Nathan] When would a type_info of a reference ever be generated? (So why __ref_type_info?)
[000126 CodeSourcery -- Nathan]
[000128 Cygnus -- Jason] Based on that, I definitely think reference type_info can go away.
[000203 All] Remove __ref_type_info.
| # | Issue | Class | Status | Source | Opened | Closed |
|---|---|---|---|---|---|---|
| A-23 | RTTI class descriptors | data | closed | CodeSourcery | 000124 | 000302 |
| Summary: Resolve several questions about the RTTI representation of class types. | ||||||
| Resolution: See the Draft C++ ABI for IA-64. | ||||||
[000124 CodeSourcery -- Nathan]
si_class_type_info
is for a single nonvirtual inheritance heirarchy.
Presumably this single non-virtual inheritance is between the derrived
and the base (the base may or may not have multiple or virtual bases).
An additional constraint is that, if the derrived class is polymorphic,
the base class is too. Rationale: if the derrived class adds
polymorphism, the base will be at a non-zero offset.
[000126 CodeSourcery -- Nathan] More useful for dynamic cast (and possibly catch matching) {than the current set of flags -- editor} would be the following flags:
Note that the virtual/non-virtual and public/non-public are not mutually exclusive. Also note that I have not actually implemented anything with these flags, so I could be wrong.
[class.mi] (clause 10.1) provides good examples of "diamond shaped."
Paragraph 4 gives a non-diamond shaped graph with multiple base object.
At least one of the multiply inherited base objects must be non-virtual.
struct L {};
struct A : L {};
struct B : L {};
struct C : A, B {};
There are two distinct L base objects in C. C would have the non-diamond shaped multiple inheritance flag set. A, B and C would have the non-virtual base flag and public base flag set.
Paragraph 5 gives a diamond shaped graph.
Such a multiply inherited base object must be virtual.
struct V {};
struct A : virtual V {};
struct B : virtual V {};
struct C : A, B {};
This time C would have the diamond shaped flag set. A, B & C would have the virtual base flag set and the public base flag set. C would also have the non-virtual base flag set.
Paragraph 6 gives a graph which contains both features.
Here there is one non-virtual base and one virtual base.
struct B {};
struct X : virtual B {};
struct Y : virtual B {};
struct Z : B {};
struct AA : X, Y, Z {};
In that example, AA would have both diamond and non-diamond flags set. all would have the public base flag set, AA & Z would have the non-virtual base flag set, AA, X & Y would have the virtual base flag set.
The above is treating the non-virtual and virtual base flags differently, they should have the following meaning:
My thinking is that for dynamic_cast, having such information will allow pruning parts of the inheritance graph walk. For instance, there can only be distinct multiple target base objects when the non-diamond shaped flag is set in the complete object. When we find them, the base sub-object started from can only be a common base for both of them, if the diamond shaped flag is set in the complete object. Alternatively, there can only be (at most) one instance of the target type when the non-diamond shaped flag is clear. When we find it via a non-public path, there could only be an alternative public path if the complete object has the diamond shaped flag set. Similar pruning should be possible for catch matching. Without such information, the graph walk has to be pessimistic, which I beleive will slow down the common case.
[000126 CodeSourcery -- Nathan]
__si_class_type_info is documented for
a single non-virtual hierarchy,
and __vmi_class_type_info for a class containing
(directly or indirectly)
a multiple or virtual inheritance component.
My mistake was to use __si_class_type_info
for a class with a single base,
regardless of the
heirachy within the base (that is the current g++ behaviour).
__si_class_type_info
is for both public and non-public inheritance
(again, something I'd not noticed, thinking it was for public only).
For this to work,
the __class_type_info flag bit 0x8 'non-publicly inherited base'
must mean `non-publicly inherited direct base'.
Please can the wording about bases here explicitly say
`direct base,' `indirect base,' or `direct or indirect base.'
The description currently use `contains' and `has' which
are open to interpretation.
In dynamic casting, access is important. In a cross cast from base A via complete type C to another base B, both B and A must be publicly accessible from C. It might be that dynamic_cast locates B, and, knowing that C does not have multiply inherited subobjects, determines it need look no further. However, it must determine access. If C has no non-public direct or indirect bases, access must be OK, without further inspection. However the hint flag 0x8 can't be indicating that, as it is only for direct bases. (This was the one case where I was able to take advantage of these flags, but alas it seems I can't.)
[000127 All]
We decided on Thursday that your "mistakes" are what we want.
__si_class_type_info will be for any class with a
single direct base at offset 0 which is public and non-virtual.
We also decided that the flags should move from
__class_type_info into __vmi_class_type_info,
and that the polymorphic flag should be removed.
[000126 CodeSourcery -- Nathan] I think this moving of the flags is a mistake. If I understood correctly, they indicated information about direct and indirect bases (whether there was virtuality anywhere in the heirarchy for instance). Such information can speed up dynamic cast. When walking the inheritance graph, we can take some early outs, if we know there are no multiple subobject types within the complete graph. With the flags in every class's type_info, it becomes easier to get hold of that info. With it only for vmi classes, we have to remember `unknown' when presented with a complete object of si type, and fill the information in when/if we find a vmi base.
Another case is in a potential cross-cast case, which I had in the previous email. Suppose we've found the target base, which we know is unique, but not found the source base (because we early outed, maybe). To be a valid cross-cast both the source and target base objects must be public in the complete object. If we know the complete heirarchy has no non-public bases, there's no need to search for the source base in this case.
[000129 Cygnus -- Jason]
I think I'd rather pay that small performance hit than add a word to the type_info for each class. Matt, would this affect locales?
... cross-casts only come up in the context of classes with multiple bases, so it wouldn't make sense to look for this in single inheritance classes anyway.
[000127 All]
[000203 All]
[000203 SGI -- Jim]
I moved the flags from __class_type_info to __vmi_class_type_info, discovering that they don't need to share space with the offset field in the __base_class_info records, but rather with the base class count. But, the __base_class_info has its own flags (virtual and public) which can reasonably share a doubleword, as we were discussing for the other flags this morning. So I specified that. Note that I put the flags in the low byte rather than the high byte. That is because the offset is signed, and it is likely that implementations will sign-extend (signed doubleword>>8), but not (doubleword & 0x00ffffffffffffffll).
After an exchange with Nathan, I reinstated his first flag (contains non-diamond multiple inheritance).
[000210 All -- Matt]
Minor corrections to RTTI discussion in data layout document: In section 7c, which describes the vmi_flags, flag 0x01 is documented incorrectly. It says "class has non-diamond multiple inheritance", which isn't quite right. We're really talking more about repeated inheritance: having multiple subobjects of the same type.
Also in vmi_flags, Jason questions whether flags 0x04 and 0x08 are necessary. What do we really need "has virtual base(s)" and "has non-virtual base(s)" for? Jason has sent email to Nathan about this.
Naming issue: we decided to put all of our type_info subclasses
in namespace abi, not namespace std. This means, of course,
that they can't go in any of the standard headers. Rather than
inventing multiple header names, we would like to put everything
(unwinding longjmp, type_info subclasses, etc.) into one quasi-
standard header. We propose the name
Issue A23 can almost be closed. The only thing we need to
resolve is whether to keep the two flags that Jason is unsure about.
[000302 All -- Matt]
[000126 CodeSourcery -- Nathan]
The amended (25th Jan) RTTI specification says:
I don't believe this is the case,
the example I posted a couple of weeks back pointed this out.
Here it is, in a slightly more compact form
I believe this is well formed and should not abort.
The RTTI document indicates that `typeid (A const * const *)' and
`typeid (B const * const *)' will produce __pointer_type_info chains
that end at a weak symbol reference for A and B respectively.
These will both resolve to zero.
How is catch matching able to determine the difference between
`A const * const *' and `B const * const *' under these circumstances?
If this is a shortcoming of the ABI,
or considered a defect in the standard, it should be documented.
There seems to be no discussion of this case.
[000127 All]
[000128 CodeSourcery -- Nathan]
In the catch matching,
the type_infos for `A const *const *' and `A const *' will be:
and those for `B const *const *' and `B const *':
I fail to see how the catch matcher can get different results comparing
__tiPP1B to __tiPCPC1A as opposed to comparing __tiPP1B to __tiPCPC1B.
They both look like qualification conversions of pointers to pointers
to incomplete type.
In the first case we'll end up comparing __tiP1B to __tiPC1A,
which still is a valid qualification conversion,
then have two NULL pointers for the pointed to types,
which somehow we have to tell apart.
In the second case we'll end up comparing __tiP1B to __tiPC1B,
and again have two NULL pointers for the pointed to types,
but this time we have to consider them the same type.
I don't see anything in [conv.qual] saying that qualification
conversions don't have to deal with incomplete types.
N.B.: old-abi g++ seg faults on the above code because it does wander
into the NULL pointers.
[000129 Cygnus -- Jason]
I think that leaves us with something like what EDG does now:
namely, comparisons are done by comparing the addresses of
one-byte commons rather than of the type_info nodes themselves.
Then we could emit incomplete info in one file and complete info in
another file and they would compare the
same because both refer to the same ID proxy.
We could mangle the complete and incomplete versions differently,
so they would not be combined by the linker.
This would also change how we refer to type_infos;
under the current scheme,
references to type_infos in the EH type table need to be via
relocs that will be resolved by the dynamic linker at runtime.
If we don't need to compare addresses,
we could use gp-relative references.
Of course,
we'd still have the absolute references in the type_infos to the ID proxies,
so we're no better off.
[000130 CodeSourcery -- Nathan]
[000203 All]
Since all we need from the common block is a distinct address,
we may want to float a base ABI proposal for a new symbol type which is
resolved by the linkers to a unique address without allocating storage.
[000210 All -- Matt]
A class's __class_type_info object and its comdat proxy both receive
mangled names. We must make sure that the proxy's mangled name is the
same for all complete and incomplete declarations of a class, that the
mangled name of the __class_type_info object is the same for all
complete declarations of a class, and that the mangled name of the
__class_type_info object is different for incomplete declarations than
for complete declarations. One way to achieve this is to make
__class_type_info objects for incomplete declarations static.
We add a new flag to __pointer_type_info; let's say bit 0x4. If
this is set, it means we have a pointer to an incomplete type (or
pointer to pointer to incomplete type, etc.)
We compare two __class_type_infos for equality by pointer comparison
of the id_proxy_ptr fields. We compare two __pointer_type_infos for
equality by looking at the addresses of the type_info objects,
*unless* the incomplete bit is set in at least one of them. If the
incomplete bit is set, we have to compare the pointed-to types. For
everything other than classes and pointers we can just use address
equality of the type_info objects themselves.
In response to Jason's 000129 question: we can't use gp-relative
references for type_info objects because we're only using comdat
proxies for __class_type_info, not for other kinds of type_info
objects.
In response to Nathan's 000130 question: this is the reason to
give the complete and incomplete __class_type_info objects different
mangled names. That way a complete __class_type_info object in a
DSO won't be overridden by an incomplete __class_type_info object
in the executable.
At the very end of this meeting we got a suggestion from Christophe
for a complete different mechanism. We agreed that we can't evaluate
it without a writeup. The suggestion: abandon these comdat proxies
altogether. Instead we have a new type_info class,
__incomplete_class_type_info. Comparisons involving two
__class_type_info objects use address equality, comparisons involving
two __incomplete_class_type_info objects, or a __class_type_info and
an __incomplete_class_type_info, do string comparison on the name. We
still would have an incomplete bit in the __pointer_type_info class,
which, again, we would use to determine whether two
__pointer_type_info objects with different addresses might
nevertheless represent the same pointer type.
[000309 All]
[000314 SGI -- Jim]
[000330 All]
When the specified width of a bitfield exceeds the size of the declared type,
the standard specifies that the accessible field is
to be padded to the specified width,
with the location of the padding implementation-defined.
That is, the accessible field could be placed at the beginning,
at the end, or in the middle of the specified bits.
(Note that such declarations are explicitly disallowed by the C 2000
draft, so this is not a C ABI issue.)
[000204 SGI -- Jim]
It seems to me that the situation that makes it interesting is the
following:
One could express this by the following rule:
[000221 CodeSourcery -- Mark]
The ABI document says that a NULL pointer-to-member function has
`ptr == 0'. It does, not, however say whether or not a NULL
pointer-to-member function also has `adj == 0'.
I believe that this should be specified as well so that code generated
to do comparison of pointers to members (of the same type)
looks like:
So, I would say:
It's occurred to me that this imposes some overhead on casting
pointers-to-members around: now when you convert from a base pointer
to member to a derived version (or vice versa), you can't just adjust
the `adj' member willy-nilly; instead, you have to check first whether
or not the pointer is NULL.
So, I'm not sure any more which scheme is preferable -- but we
definitely need to say clearly which we want.
[000222 CodeSourcery -- Mark]
So, it would be helpful if we were to add:
[000229 SGI -- Jim]
Comparisons (5.10) of pointers to virtual member functions are undefined.
So, for pointer-to-function-member comparisons,
we only need to worry about non-virtual members and null.
Since the representation stores the actual address of the function descriptor,
we should be able to just compare the pointers, and ignore the adjustment.
For conversions between base classes,
it seems that we need only modify the adjustment,
and then only if one is not primary for the other.
For conversion to null,
it seems that we need only set the pointer to 0,
and can ignore the adjustment.
[000302 All]
Represent NULL by a 0 pointer, with the adjustment unspecified.
[000222 CodeSourcery -- Mark]
We haven't specified a way to represent a NULL pointer to data member.
G++ presently adds one to the offset,
allowing zero to serve as the NULL pointer to member.
[000223 CodeSourcery -- Mark]
What is the value for the NULL pointer to data member?
I guess -1 would do,
unless there are cases I can't think of where the pointer
to member would legitimately have a negative value.
Maybe 0x8000000000000000 is better...
It's illegal to do this if the base is virtual. But, that's the
only case in which the `this' pointer can increase.
[000229 SGI -- Jim]
From the Standard:
So we can conclude that,
since we always allocate non-virtual bases before data members,
any base object in a derivation chain will have its base address
smaller than any of the data members declared in members of the chain.
Therefore, the offset represented by a pointer-to-data-member
will always be non-negative,
even after the permitted conversions above.
So, we could either use -1 for NULL, or use 0 and increment the offset.
0x800...000 is an unnecessary complication.
[000302 All]
Represent NULL by the value -1.
[000406 CodeSourcery -- Nathan]
The current RTTI proposal loses the property that all type_info objects
can be compared for equality and orderability by address comparison.
Instead, type_info::operator== must involve a virtual function call
or unconditionaly strcmp.
(An alternative of testing the typeid of the
polymorphic type_info objects results in infinite recursion!)
Here are two proposals which reinstate the address equality property.
The first is rather different to the current scheme, but when I was
done documenting it, I realised there was a minor modification to the
current scheme, which partially reinstates the address equality. I
present both for consideration. Feel free to shot them down ...
The base class of these is:
This contains a pointer to the type_info object
produced by the typeid operator,
for whatever type this is describing.
That will be a unique object.
There are a number of necessary derivations of this type,
which can be taken largely unaltered from the current proposal.
It is necessary to distinguish function types, so that catch matching
can distinguish a data pointer object from a function pointer object.
Other types (fundamental, enum, array) need not be distinguished,
and can be represented by an abi::__type_info object.
(Or we could keep the current proposal of having separate derivations
for these.)
Pointers are as they currently are,
other than the base class change.
We still need the incomplete target flag.
Pointers to member could be a sibling class of non member pointers.
However, they do share common functionality,
and IMO it makes sense to derive from __pointer_type_info.
The __class_type_info, __si_class_type_info and __vmi_class_type_info
are unchanged, other than the change to __class_type_info's base.
The vtable slot -1,
(which currently holds a pointer to the std::type_info object for a class),
points to the abi::__class_type_info object.
To implement typeid(X),
where X is polymorphic,
involves an additional indirection through the
abi::__type_info base to return the `type' member.
dynamic_cast uses the abi::__class_type_info object pointed to in the vtable.
throwing and catch matching use the abi::__type_info object
for the type being thrown or caught.
As with the current proposal,
an incomplete type is represented by an abi::__class_type_info object.
Note that its abi::__type_info base
will point to the unique std::type_info object for that type,
regardless of whether a DSO completes the type.
This incomplete type is prevented
from preempting the complete type information.
Also direct or indirect pointers to incomplete have their incomplete
flag set,
and are also prevented from preempting the equivalent pointer to
complete object.
During catch matching,
comparison of pointers can compare the abi::__pointer_type_info addresses,
unless either has the incomplete flag set,
in which case the std::type_info objects pointed to must be compared.
(The std::type_info objects could be compared even when the incomplete
flags are clear.)
There are two or three naming schemes with this proposal:
Advantages of this proposal are:
The cost of this proposal is
The first proposal is essentially
using the std::type_info objects as unique objects,
via which incomplete types can be compared.
We already have such a unique object candidate --
the NTBS name member of std::type_info.
Currently we've not said anything about that.
If, however, we give that NTBS comdat linkage, a unique name,
and prevent it being commonized with other strings, we have a proxy.
These features can be obtained by treating it as a
`const char []' rather than a string constant.
type_info equality and orderability can now use the address of this array,
rather than the type_info objects themselves.
We can do this in all cases,
even though it is only necessary for the pointer to incomplete case,
as that avoids a virtual function call.
Here is an implementaion of type_info::operator==
We need to specify the naming scheme for the NTBS.
The advantages of this are
The costs over proposal A are
[000411 CodeSourcery -- Nathan]
Issue 2
The algorithm for collation order of type_infos,
cannot simply compare addresses for non-pointer types,
and complete pointer types.
Using string collation only works
when one of the types is a pointer with the incomplete_mask set.
There are two difficulties.
Firstly, we might be
comparing a non-pointer type_info with a pointer type_info. We need to
determine this and DTRT WRT the incomplete flag of the pointer
type_info. to do that will require dynamic_cast or typeid'ing the
type_infos. Secondly, assume we are just comparing pointer type_info's.
We have two pointers to complete, Aptr and Bptr, and a third pointer to
incomplete, Cptr.
There is nothing maintaining the consistency of the results of these
three tests -- result 1 is uncorrelated with results 2 & 3.
Therefore type_info::before must be implemented as string compare on
the type's names. We lose any advantage of commonizing the type_infos.
Issue 3
17.4.4.4 prevents an implementation adding member functions to one
of the std classes, except in particular circumstance. About the only
leeway given is whether a particular non-virtual function is inline or
not. So I presume we're not permitted to add virtual member functions
to std::type_info (18.5.1). The rules given in 17.4.4.4 specifying what
member functions can be added look like applications of the as-if rule,
but there must be something deeper going on, as if that was all, it
wouldn't be mentioned. I'm not sure how a conforming program could tell
whether additional functions had been added.
The abi requires us to add virtual functions to type_info.
For instance the implementation of operator== will require it to
deal with pointers to incomplete. G++ needs several for catch matching.
Issue 4
5.2.8 talks about typeid returning something derived from type_info,
but the footnote mentioning extended_type_info implies to me that
typeid always returns objects of the same type.
Again, I'm not sure how a conforming program could tell.
The two proposals above resolve these issues.
Proposal A resolves issues 2,3 &4,
whilst proposal B resolves issue 2 only,
and will leave us (slightly) non-conformant.
[000413 All]
Proposal B resolves the remaining issue,
and the group is inclined to accept it,
while considering whether to go further with A.
Jim will (and has) integrated B into the
Draft C++ ABI for IA-64.
[000504 All]
[000407 CodeSourcery -- Nathan]
__pointer_to_member_type_info is derived from type_info.
I strongly recommend it be derived from __pointer_type_info,
as it requires much of the same functionality,
and has the same meanings of its flags.
By subclassing __pointer_type_info, much code could be reused.
Thus point 8 of the rtti classes would become
[000411 CodeSourcery -- Nathan]
incomplete_mask is an inclusive or of the other two flags.
incomplete_klass_mask is only used by __pointer_to_member_type_info,
and __pointer_type_info knows nothing about it (it simply examines the
other two).
A __pointer_type_info or __pointer_to_member_type_info sets the
incomplete_mask and incomplete_chain_mask, if the target is an
incomplete
type, or has its incomplete_mask set.
A __pointer_to_member_type_info sets the incomplete_mask and the
incomplete_klass_mask, if the class of the member is incomplete.
[000411 Ed.]
[000413 All]
(Ed. note) I've added updates to the
Draft C++ ABI for IA-64.
[000504 All]
[001012 all -- Jim]
The issue here, raised originally by Martin, I will open as A-30.
Implementations will generally need additional virtual functions
associated with the type_info hierarchy to implement such functionality
as dynamic cast. Gcc for instance has functions __is_function_p,
__do_catch, __pointer_catch, ...
A program that is built from pieces from different compilers, where the
pieces come from different implementations of the hierarchy, will see
different structures, at least in the vtables, if we allow this extra
material to be arbitrary, creating a problem if such programs actually
make use of parts of the hierarchy.
We worked out the following possible solution:
The implementation will create one instance of this class for each of
the classes derived from std::type_info, and we will specify a
mangled name for it.
Now an implementation can add an arbitrary set of functions to
__cxa_aux_typeinfo, specialized to the derived class like a virtual
function, without changing the external interface (to the user) of
the hierarchy.
[001103 SGI -- Jim]
[...leaving out much discussion...]
So, after all the above, I suggest the following actions:
[001109 all]
[001019 CodeSourcery -- Mark]
I think I recall that the committee was intentionally trying to use
the tail padding of one object to save space. For example, consider:
(These are PODs, but you can easily make an equivalent non-POD
example).
Here, I think the comittee wanted to give `B' size 4, by packing `d'
into the tail padding of `A'.
I think this is a mistake. David Gross came up with the following
example:
Code generator needs to copy dsize, not sizeof, unless it can prove
that the object is in a context where tail padding isn't overlayed.
Reason? Tail padding might be overlayed by a volatile field.
Hence, a non-POD that looks like
requires ld2/st2/ld1/st1 for a copy instead of ld4/st4 because we
might have
Similarly, people using memcpy to copy around POD components of
non-PODs will get burned.
This completely breaks user expectation since people routinely expect
to be able to stick a function or two into a POD without changing its
layout.
I think we should make the following changes:
Note that this still permits the empty base optimization; nvsize will
be zero, and sizeof will be 1.
There's an important different between using the tail padding in an
empty base and the tail padding in a generic object: you know that you
never have to copy an empty base.
[001109 all]
Therefore, we have decided to eliminate the overlaying of tail padding.
Mark will provide alternate proposed wording for the ABI document.
[990623 HP -- Christophe]
The following proposal applies only to calls to virtual functions
when a this pointer adjustment is required from a base class to a
derived class.
Essentially, this means multiple inheritance, and the
existence of two or more virtual table pointers (vptr)
in the complete object.
The multiple vptrs are required so that the layout
of all bases is unchanged in the complete object.
There will be one additional vptr for each base class which already
required a vptr,
but cannot be placed in the whole object so that it shares its vptr
with the whole object.
Note: when the vptr is shared,
the base class is said to be the "primary base class",
and there is only one such class.
For the primary base class, no pointer adjustment is needed.
For all other bases, a pointer to the whole object is not a pointer
to the base class,
so whenever a pointer to the base class is needed,
adjustment will occur.
In particular, when calling a virtual function,
one does not know in advance in which class the function was actually defined.
Depending on the actual class of the object pointed to,
pointer adjustment may be needed or not,
and the pointer adjustment value may vary from class to class.
The existing solution is to have the vtable point not to the function itself,
but to a "thunk" which does pointer adjustment when needed,
and then jumps to the actual function.
Another possibility is to have an offset in the vtable,
which is used by the called function.
However, more often than not, this implies adding zero.
Virtual bases make things slightly more complicated.
In that case, the data layout is such that there is only
one instance of the virtual base in the whole object.
Therefore, the offset from a this
pointer to a same virtual base may change along the inheritance tree.
This is solved by placing an offset in the virtual table,
which is used to adjust the this pointer to the virtual base.
My proposal is to replace thunks with offsets,
with two additional tricks:
The thunks are believed to cost more on IA64 than they would on
other platforms.
The reason is that they are small islands of code spread throughout the code,
where you cannot guarantee any cache locality.
Since they immediately follow an indirect branch,
chances are we will always encounter both a branch misprediction and a
I-cache miss in a row.
On the other hand,
a virtual function call starts by reading the virtual function address.
Reading the offset immediately thereafter should almost never cause a
D-cache miss (cache locality should be good).
More often than not, no adjustment is needed,
or the adjustment will be done at call site correctly.
In the worst case scenario, we perform two adjustments,
one static at call site, and one dynamic in the callee,
but this case should be really infrequent.
The new calling convention requires that the 'this' pointer on entry
points to the class for which the virtual function is just defined.
That is, for A::f(),
the pointer is an A* when the main entry of the function is reached.
If the actual pointer is not an A*,
then an adjusting entry point is used,
which immediately precedes the function.
In the following, we will assume the following examples:
convert_to_D and convert_to_E are likely to be at the same offset in
the vtable. This is not a problem, even if D and E are used in the
same class, such as F, because this is the same offset in different
vtables.
The fact that an offset is reserved does not mean that it is
actually used. A vtable need to contain the offset only if it refers
to a function that will use it. An offset of 0 is not needed, since
the function pointer will point to the non-adjusting entry point in
that case.
In other words, adjustment is made only when necessary, and at a
place where it is better scheduled than with thunks. The only bad
case is double adjustment for call_Cg called with an E*. This case
can probably be considered rare enough, compared to calls such as
call_Cg called with a C*, where we now actually do the adjustment at
the call-site.
Currently, the sequence for a virtual function call in a shared
library will look as follows. I'm assuming +DD64, there would be some
additional addp4 in +DD32. The trail below is the dynamic execution
sequence. In bold and between #if/#endif, the affected code.
[990812 All]
Discussion of B-6 raises questions of impact on the above approach.
Christophe will look at the issues.
[990826 Cygnus -- Jason]
[An alternative suggestion from Jason via email.]
Rather than per-function offsets, we have per-target type offsets.
These offsets (if any) are stored at a negative index from the vptr.
When a derived class D overrides a virtual function F from a base class B,
if no previously allocated offset slot can be reused,
we add one to the beginning of the vtable(s) of the closest base(s)
which are non-virtually derived from B.
In the case of non-virtual inheritance, that would be D's vtable;
in simple virtual inheritance, it would be B's.
The vtables are written out in one large block,
laid out like an object of the class,
so if B is a non-virtual base of D,
we can find the D vtable from the B vptr.
D::f then recieves a B*, loads the offset from the vtable,
and makes the adjustment to get a D*.
The plan is to also have a non-adjusting vtable entry in D's vtable,
so we don't have to do two adjustments to call D::f with a D*;
the implementation of this is up to the compiler.
I expect that for g++,
we will do the adjustment in a thunk which just falls into the main function.
The performance problems with classic thunks occur when the thunk is
not close enough to the function it jumps to for a pc-relative branch.
This cannot be avoided in certain cases of virtual inheritance,
where a derived class must whip up a thunk for a new adjustment
to a method it doesn't override.
In this case, we will only ever have one thunk per function,
so we don't even have to jump.
Except in the case of covariant returns, that is,
where we will have one per return adjustment.
But we know all necessary adjustments at the
point of definition of the function,
so they can all be within pc-relative branch range.
[Extensive discussion followed by email --
this suggestion is not completely correct,
but may be the basis of a workable solution.]
[990831 Cygnus -- Ian]
A couple of observations ...
On the state of the art:
The Microsoft approach is worth mentioning.
(I haven't seen it discussed --
though perhaps that is because of the patent situation.)
It allows zero-adjusting (i.e. non-thunking) calls for (almost)
every virtual function call in a non-virtual,
multiple inheritance hierarchy.
For those that are unfamiliar,
the idea is that all calls go via the base class vft and overriding
functions expect a pointer to the base class type.
(That is, if D::f overrides B::f, it expects the first
parameter to be of type B*, not D*.)
The callee does the necessary static adjustment to get to the
derived class 'this' pointer as needed.
It avoids requiring a thunk,
and it's often the case that the cost is zero in the callee because
the this-adjustment can be folded into other offset computations.
On the balance,
it could well win over all the other approaches being discussed here.
[Though, it may lose in some specific cases vs. Christophe's approach
where one would create additional extra entries in
the derived class vft.]
On when to make extra virtual function table entries for functions:
One of Cristophe's suggestions is sort-of separate
from the rest of the discussion:
making extra entries in the derived class' vft for some
overridden virtual functions.
It has the benefit of giving you a faster calls if you happen to be in
(or near) the derived class -- at the expense of space in the vft.
Of course, you can always make the call through the introducing base class,
so these extra entries are a pure space/time performance trade off
(w/ some unpredictable D-cache effects) and the cost/benefit analysis
will depend a little on what the rest of the strategy looks like.
The same idea is potentially applicable,
no matter what strategy you actually use for vft layout,
and different criteria for deciding what extra entries to make are possible.
For example,
creating an extra entry when overriding a function introduced in a
virtual base has the added benefit of avoiding a cast to a virtual
base at the call site.
[990909 All]
We are getting closer --
understanding of the alternatives is improving,
and Christophe may agree with the Jason/Brian proposal after more thought.
To make sure we really understand what we're agreeing to,
Jason and Christophe will write up more precise proposal(s).
[991111 jason]
We have decided that for virtual functions not inherited from a virtual base,
regular thunks will work fine,
since we can emit them immediately before the
function to avoid the indirect branch penalty;
we will use offsets in the
vtable for functions that come from a virtual base,
because it is impossible to predict what the offset between the
current class and its virtual base will
be in classes derived from the current class.
The calling convention is as follows:
For each virtual function defined in a class,
we add an entry to the primary vtable if one is not already there.
In particular, a definition which overrides a function inherited from
a secondary base gets a new slot in the primary vtable.
We do this to avoid useless adjustments when calling a virtual
function through a pointer to the most derived class.
When a class is used as a virtual base,
we add a vcall offset slot to the beginning of its vtable for each of
the virtual functions it provides,
whether in its primary or secondary vtables.
Derived classes which override these functions will use the slots to
determine the adjustment necessary.
As in Christophe's proposal above,
the caller adjusts the 'this' argument to
point to the class which last overrode the function being called.
The result provides both the 'this' argument and the vtable pointer
for finding the function we want.
Each virtual function 'f' defined in a class 'A' has one entry point
which takes an A*, and performs no adjustment.
The primary vtable for A points to this entry point.
For each secondary vtable from a non-virtual base class 'B' which
defines f,
an additional entry point is generated which performs the constant
adjustment from B* to A*.
For each secondary vtable from a virtual base class 'C' which defines f,
an additional entry point is generated which performs the adjustment
from C* to A* using the vcall offset for f stored in the secondary
vtable for C.
For each secondary vtable from a base 'D' which is a non-virtual base
of a virtual base 'E',
an additional entry point is generated which
first performs the constant adjustment from D* to E*,
then the adjustment from E* to A* using the vcall offset for f stored
in the secondary vtable for E.
Note that the ABI only specifies the multiple entry points;
how those entry points are provided is unspecified.
An existing compiler which uses thunks could be converted to use this
ABI by only adding support for the vcall offsets.
A more efficient implementation would be to emit all of the thunks
immediately before the non-adjusting entry point to the function.
Another might use predication rather than branches to reach the main function.
Another might emit a new copy of the function for each entry point;
this is a quality of implementation issue.
[991202 all]
[990610 Matt]
One possibility is to have two Vtable entries,
which might point to different functions, different entrypoints,
or a real entrypoint and a thunk.
Another is to return two result pointers (base/derived),
and have the caller select the right one.
[990715 All]
Daveed presented his multiple-return-value scheme,
including an example that involved virtual base classes,
return values that are pointers to nonpolymorphic classes,
and other equally horrible things.
Consensus: we need to get the horrible cases correct,
but speed only matters in the simple case.
The simple case: class B has a virtual function f returning a B1*
and class D has a virtual function f returning a D1*,
where all four classes are polymorphic,
B is a primary base of D, and B1 is a primary base of D1.
(The really important case is where B1 is B and D1 is D,
but that simplification doesn't make any difference.)
Jason: Would the usual multiple-entry-point scheme work just as well?
That is, would it be just as fast as Daveed's scheme in the simple case,
and still preserve enough information for the more complicated cases?
It appears so, but we don't have a proof.
Jason will try to provide one.
[990716 Cygnus -- Jason]
Proof?
You always know what types a given override must be able to return,
and you know how to convert from the return type to those base types.
You know from the entry point which type is desired.
Seems pretty straightforward to me.
[990716 Cygnus -- Jason]
The alternative I was talking about yesterday goes something like this:
When we have a non-trivial covariant return situation,
we create a new entry in the vtable for the new return type.
The caller chooses which vtable entry to use based on the type they want.
This could be implemented several ways,
at the discretion of the vendor:
The advantage of this approach to the complex case is that we don't have to
do a dynamic_cast when faced with multiple levels of virtual derivation.
It is also strictly simpler;
Daveed's model already requires something like
this in cases of multiple inheritance.
Of course, we can always mix and match;
we could choose to only do this in cases of virtual inheritance,
or use Daveed's proposal and do this only in
cases of repeated virtual inheritance.
In that case, the multiple returns
would just be an optimization for the single virtual inheritance case.
Since we don't seem to care about the performance of
anything but single nonvirtual inheritance,
it seems simpler not to bother with multiple returns.
The remaining question is how to handle the case of nontrivial
nonvirtual inheritance:
do we use multiple slots or have the caller do the adjustment?
My inclination is to have the caller adjust.
WRT patents,
the idea of having the function return the base-most class and having
the caller adjust is parallel to the patented Microsoft scheme whereby
they pass the base-most class as the 'this' argument to virtual functions,
but the word 'return' does not appear anywhere in the patent,
so it seems safe.
[990722 All]
The group was generally agreed that the simplicity of multiple entries
in the vtable outweighed any space/performance advantage of more
complex schemes (e.g. the method Daveed described on 15 July).
Discussion focussed on whether it is worthwhile to eliminate some of
the entries in cases where they are unnecessary because the caller
knows the required conversion,
namely when the return type has a unique non-virtual subobject of the
original return type.
Agreement was reached to avoid the complication of eliminating some of
the Vtable entries.
Thus, the Vtable will have one entry for each accessible return type of
a covariant virtual function.
These may be implemented in a variety of ways,
e.g. duplicated functions, separate entrypoints, or stubs,
and the ABI need not specify the choice.
The location of the Vtable entries is part of the separate Vtable
layout issue B-6.
[990604 HP -- Christophe]
Mike (Ball) gave me what I believe is an excellent definition of
when caching is allowed. I'd like him to present it.
[990805 All]
Christophe explained that the rule is simply that,
within a call to a member function of the class,
the class Vtable may not be modified.
Between such calls, no assumption may be made.
With this observation, the issue is closed.
[990812 All]
The rule is even simpler.
Once a program changes the type of a pointer's target,
the pointer is invalidated, and its value may not be reused.
Therefore, a code sequence which repeatedly refers to the same pointer
value is invalid if the pointee's vtable has been changed.
[990624 All]
Note that putting GP in the Vtable prevents putting it in shared memory.
See B-7.
[990805 All]
I
#
Issue Class Status
Source Opened Closed
A-24
RTTI for incomplete types
data
closed
CodeSourcery
000126
000330
Summary:
How does RTTI represent incomplete types?
Resolution:
Use class_type_info distinct from the complete type copy,
add a flag to pointer_type_info if it points to incomplete type RTTI,
and do mangled name comparison if an incomplete pointer is involved.
Note that the full structure described by an RTTI descriptor may
include incomplete types not required by the Standard to be completed,
although not in contexts where it would cause ambiguity.
struct A;
struct B;
int main ()
{
try {
throw (B **)0;
} catch (A const * const *) {
abort ();
} catch (B const * const *) {
;//ok
} catch (...) {
abort ();
}
}
__tiPP1B:
.long __vt_19__pointer_type_info
.long .LC2
.long 0
.long __tiP1B
__tiP1B:
.long __vt_19__pointer_type_info
.long .LC3
.long 0
.long __ti1B ;; not emitted, will resolve to zero
__tiPCPC1A:
.long __vt_19__pointer_type_info
.long .LC1
.long 1
.long __tiPC1A
__tiPC1A:
.long __vt_19__pointer_type_info
.long .LC4
.long 1
.long __ti1A ;; not emitted, will resolve to zero
__tiPCPC1B:
.long __vt_19__pointer_type_info
.long .LC0
.long 1
.long __tiPC1B
__tiPC1B:
.long __vt_19__pointer_type_info
.long .LC5
.long 1
.long __ti1B ;; not emitted, will resolve to zero
#
Issue Class Status
Source Opened Closed
A-25
Excess-width bitfields
data
closed
IBM
000204
000217
Summary:
C++ allows bitfields with a larger size specified than that required by
the declared type, e.g. int f: 64.
How should they be allocated?
Resolution:
Allocate the field with alignment determined as though it were the
largest integer type that fits in the specified size,
and use the first bits available in the field
(lowest order for little endian IA-64)
for the actual data.
In this case, I don't want the accessible part of i at the beginning or
the end -- I want it in the middle. Doing otherwise yields either a
badly aligned i, or wasted space.
struct s {
short s1;
int i: 64;
short s2;
}
[000204 IBM -- Mark]
I disagree.
If the user wants the bitfield to be aligned in a certain place,
he has the tools to do so.
He can certainly pick a different size bitfield.
I think that this should be aligned as if it is the same size as the type,
and then the extra bits put somewhere.
Putting them afterwards is probably simpler than before,
or splitting it in the middle.
[000217 All]
The rationale for the solution chosen is that the most likely reason
for using this feature is to achieve a known allocation for an enum
type when the user does not know how big compilers will make it.
Thus, we want "enum ... e : 32;" to behave as though
the compiler allocated a 32-bit int,
even if it actually uses only 8 bits for the enum value.
#
Issue Class Status
Source Opened Closed
A-26
NULL pointers to member functions
data
closed
CodeSourcery
000221
000302
Summary:
How are NULL pointers to member functions represented?
Resolution:
A NULL pointer is represented by a 0 value of ptr,
and the value of adj is irrelevant.
and not:
to the ABI document.
is required since in the case that p1.ptr and p2.ptr are both
zero, there `adj' fields are irrelevant.)
#
Issue Class Status
Source Opened Closed
A-27
NULL pointers to data members
data
closed
CodeSourcery
000222
000302
Summary:
How are NULL pointers to member data represented?
Resolution:
A NULL pointer is represented by the value -1.
But, therefore, converting a non-NULL value to NULL is explicitly
permitted by the standard.
#
Issue Class Status
Source Opened Closed
A-28
RTTI equality testing
data
closed
CodeSourcery
000406
000504
Summary:
Can we get back the ability to do a simple test for RTTI equality?
Resolution:
Mangle the name NTBS for std::type_info separately,
emit it in its own COMDAT,
and use it instead of the RTTI struct,
at least if the incomplete flags are set in pointer types.
Proposal A
class abi::__type_info
{
std::type_info const *type; // pointer to typeid(foo) object.
virtual ~__type_info ();
... other implementation defined member functions
};
class abi::__function_type_info
: public abi::__type_info
{
virtual ~__function_type_info ();
... other implementation defined member functions
};
class abi::__pointer_type_info
: public abi::__type_info
{
abi::__type_info const *target; // target type of the pointer
unsigned flags; // flags, as currently specified
virtual ~__pointer_type_info ();
... other implementation defined member functions
};
class abi::__pointer_to_member_type_info
: public abi::__pointer_type_info
{
abi::__class_type_info const *klass; // class of the member
virtual ~__pointer_to_member_type_info ();
... other implementation defined member functions
};
class abi::__class_type_info
: public abi::__type_info
{
... as currently defined
}
Proposal B
bool type_info::operator== (type_info const &other) throw ()
{
return name == other.name;
}
#
Issue Class Status
Source Opened Closed
A-29
RTTI pointer-to-member
data
closed
CodeSourcery
000407
000504
Summary:
Derive __pointer_to_member_type_info from __pointer_type_info.
Resolution:
Derive __pointer_to_member_type_info and __pointer_type_info from
a common base class __pbase_type_info.
Add a new flag to __pbase_type_info indicating that the class of a
pointer-to-member is incomplete
(propagated up a chain of pointers).
The abi::__pointer_to_member_type_info type adds one field to
abi::__pointer_type_info:
incomplete_mask = 0x8
incomplete_chain_mask = 0x10
incomplete_klass_mask = 0x20
#
Issue Class Status
Source Opened Closed
A-30
RTTI portability
data
closed
HUB
001012
001109
Summary:
What must be specified to produce RTTI portability?
Are member layouts specified? Names? Virtual functions?
Resolution:
Data members of the ABI-defined type_info derived classes must be
allocated as specified, and their names are normative.
Virtual functions, beyond the Standard-specified destructor,
are implementation-specific,
and may not be referenced outside the compiler and system vendors'
runtime libraries.
std::type_info:
class __cxa_aux_typeinfo {
... (*__is_function_p) (...);
...
};
class std::type_info {
...
protected:
__cxa_aux_typeinfo *__aux;
type_info (void) { /* set up __aux */ };
};
#
Issue Class Status
Source Opened Closed
A-31
Overlaying tail padding
data
closed
CodeSourcery
001019
001109
Summary:
Should we change the decision to overlay tail padding in class layout?
For volatile members? In general?
Resolution:
The overlaying of tail padding is eliminated,
but we will retain the treatment of empty bases.
struct A { short s; char c; };
struct B { A a; char d; };
struct S { short sh; char ch; };
struct T { S s; volatile char d; };
B. Virtual Function Handling Issues
#
Issue Class Status
Source Opened Closed
B-1
Adjustment of "this" pointer (e.g. thunks)
data call
closed
SGI
990520
991202
Summary:
There are several methods for adjusting the this pointer
for a member function call,
including thunks or offsets located in the vtable.
We need to agree on the mechanism used,
and on the location of offsets, if any are needed.
To maximize performance on IA64,
a slightly unusual approach such as using secondary entry points
to perform the adjustment may actually prove interesting.
Resolution:
See the writeup in the Draft C++ ABI for IA-64.
Open Issues Relevant To This Discussion
1. Scope and "State of the Art"
2. Proposal and Rationale
3. New Calling Convention
struct A { virtual void f(); };
struct B { virtual void g(); };
struct C: A, B { }
struct D : C { virtual void f(); virtual void g(); }
struct E: Other, C { virtual void f(); virtual void g(); }
struct F: D, E { virtual void f(); }
void call_Cf(C *c) { c->f(); }
void call_Cg(C *c) { c->g(); }
void call_Df(D* d) { d->f(); }
void call_Dg(D* d) { d->g(); }
void call_Ef(E* e) { e->f(); }
void call_Eg(E* e) { e->g(); }
void call_Ff(F *ff) { ff->f(); }
void call_Fg(F *ff) { ff->g(); } // Invalid: ambiguous
4. Cases where adjustment is performed
5. Comparing the code trails
// Compute the address of the vptr in the object,
// from the this pointer
// Optional, since vptroffset is often 0.
// This also adjusts to the class of the final overrider
addi Rthis=vptroffset_of_final_overrider,Rthis
;;
// Load the vptr in a register
ld8 Rvptr=[Rthis]
;;
// Add the offset to get to the function descriptor pointer
// in the vtable. Never zero, this instruction is always generated
addi Rfndescr=fndescroffset,Rvptr
;;
// (Assuming inlined stub) Load the function address and new GP
ld8 Rfnaddr=[Rfndescr],8
;;
// Load the new GP
ld8 GP=[Rfndescr]
mov BRn=Rfnaddr
;;
// Perform the actual branch to the target
// ...
// ... Branch misprediction almost always, followed by
// ... I-Cache miss almost always if jumping to a thunk
br.call B0=BRn
#if OLD_ADJUST
thunk_A::f_from_a_B:
// If the 'adjustment_from_B_to_A is the 'adjustment_to_A' above,
// then in the new case, the vtable directly points to A::f
addi Rthis,adjustment_from_B_to_A
// In most cases, we can probably generate a PC-relative branch here
// It is unclear whether we would correctly predict that branch
// (since it is assumed that we arrive here immediately following
// a misprediction at call site)
br A::f
#endif // OLD_ADJUST
// This occurs less often than OLD_ADJUST
// (it does not happen when call-site adjustment is correct)
#if NEW_ADJUST
adjusting_entry_A::f
// Can't be executed in less than 3 cycles?
addi Rvptr=class_adjustment_offset,Rvptr
;;
// This loads data which is close to the fn descriptor,
// so it's likely to be in the D-cache
ld8 Rvptr=[Rvptr]
;;
add Rthis=Rthis,Rvptr
#endif
A::f:
alloc ...
Final virtual calling convention:
#
Issue Class Status
Source Opened Closed
B-3
Allowed caching of vtable contents
call
closed
HP
990603
990805
Summary:
The contents of the vtable can sometimes be modified,
but the concensus is that it is nonetheless always allowed to "cache" elements,
i.e. to retain them in registers and reuse them,
whenever it is really useful.
However, this may sometimes break "beyond the standard" code,
such as code loading a shared library that replaces a virtual function.
Can we all agree when caching is allowed?
Resolution :
Caching is allowed.