Release notes for 64 bit FTN95

Introduction

FTN95 creates 64 bit executables and DLLs when:

a) the option /64 is used on the FTN95 command line,
b) SLINK64 is used in place of SLINK
c) salflibc64.dll and clearwin64.dll are used in place of salflibc.dll.

FTN95

This initial 64 bit full release does not allow you to combine /64 with /optimise nor /check (nor options that imply /check). It includes a beta version of the 64 bit debugger called SDBG64 that can be used together with /debug etc. on the FTN95 command line. Developers can still use /check etc. without /64 in order to test for run-time faults during development.

Extended precision (REAL*10) is not available when creating 64 bit applications.

SLINK64

SLINK64 can be used in:

a) command line mode
b) interactive mode or
c) script file mode

Here is an example of using command line mode...

   FTN95 prog.f95 /64
   FTN95 sub.f95  /64
   SLINK64 prog.obj sub.obj /file:prog.exe

Here is an example of using interactive mode...

   SLINK64
   $ lo prog.obj
   $ lo sub.obj
   $ file prog.exe

Here is an example of using a script file...

   SLINK64 @prog.inf

where prog.inf contains...

   lo prog.obj
   lo sub.obj
   file prog.exe

For further information see below or type...

SLINK64 /help

SLINK64 automatically scans commonly used Windows DLLs. If a Windows function located in (say) xxx.dll is reported as missing then the DLL should be loaded by using a script command of the form

   lo C:\Windows\Sysnative\xxx.dll

where C:\Windows illustrates the value of the %windir% environment variable.

Note that the initial release of SLINK64 can construct executables and DLLs but not static libraries.

SDBG64

The 64 bit debugger is provided as a beta release for users to test. It operates in essentially the same way as the corresponding 32 bit debugger.

ClearWin+

64 bit ClearWin+ was previously available for use with third party compilers via clearwin64.dll. This DLL has now been extended for use with 64 bit FTN95. Users who have already adapted their code for use with third-party compilers can continue to use their modified code. Alternatively native FTN95/ClearWin+ code can be used without change apart from the following exceptions:

64 bit Microsoft Windows HANDLEs are addresses (64 bit integers). So if a Windows handle is used explicitly in Fortran code, it will currently appear as a 32 (KIND=3) integer and must become a 64 bit (KIND=4) integer for 64 bit applications. FTN95 has a special KIND value (7) that is interpreted as KIND=3 for 32 bit applications and KIND=4 for 64 bit applications. Alternatively INTEGER(KIND=CW_HANDLE) can be used together with standard INCLUDE and MOD files because CW_HANDLE is defined as a parameter with value 7. Windows HANDLEs are mainly used with %lc, %hw and some direct calls to the Windows API.
The function CLEARWIN_INFO@ now returns an INTEGER(KIND=7) value.
64 bit ClearWin+ does not currently support SIMPLEPLOT (%pl). Also a few very old graphics routines have not been ported to 64 bit ClearWin+.
The function cpu_clock@ is not available for 64 bit applications and has been replaced by rdtsc_val@...INTEGER(KIND=4) FUNCTION RDTSC_VAL@()

SRC

Use the command line option /r for 64 bit applications and link the resulting .res file (together with the .obj files) via SLINK64.

Current experience suggests that using the "default.manifest" in a resource script causes the resulting 64 bit application to fail to load. However, a user supplied manifest file can improve the appearance. The text of a suitable manifest file is presented below.

RESOURCES

A RESOURCES directive can be used at the end of a 64 bit Fortran main program but it only has effect when used with FTN95 command line options /LINK or /LGO. Otherwise a separate call to SRC is required. (For Win32 main programs, FTN95 automatically adds the resources to the main object file).

Silverfrost INCLUDE and MOD files

Silverfrost INCLUDE files have been modified so that Microsoft HANDLEs have type INTEGER(KIND=7).

Silverfrost MOD files can be used without change provided they are updated to those in this release.

Note that user FTN95 MOD files for 64 bit applications may differ from those for 32 bit applications. So FTN95 uses the extension .mod64 for 64 bit MOD files whilst retaining teh extension .mod for 32 bit MOD files. The corresponding object files always differ and the respective linker (SLINK or SLINK64) will reject object files of the wrong kind.

By default FTN95 uses the extension .obj for both 64 bit and 32 bit object files. For projects, both Plato and Visual Studio retain the default extension and use a system of sub-folders in order to create executables for different platforms (such as Win32, x64 and .NET) and for differenct configurations (such as Debug, CheckMate and Release).

Users who prefer to build their applications using batch and/or makefiles can adopt a similar sub-folder approach to that used by Plato and Visual Studio. Alternatively, 64 bit object files can be given a different extension (e.g. .o64) by using /BINARY (together with the object file name) on the FTN95 command line. In that way, 64 bit and 32 bit object files could reside in the same folder.

Plato

The associated release of Plato is configured by default to use FTN95 when you select "Release x64" on the main toolbar. Previously this used gFortran. The default can be changed from the Options dialog.

Plato can launch the 64 bit debugger as an external application.

Redistributing

Like salflibc.dll, salflibc64.dll and clearwin64.dll can be freely redistributed with your applications and DLLs.

Additional notes on porting from 32 bit to 64 bit applications

1) When using the standard Fortran SIZE intrinsic, FTN95 with /64 returns a 64 bit integer despite the fact that this is not strictly Standard conforming. In certain very special circumstances, this change can cause existing code to fail. For example, failure will occur if SIZE(x) appears as the value of an argument to an overloaded subprogram (i.e. a subprogram that has various definitions depending on the types of its arguments). A new command line option /SIZE32 is provided in order to resolve this conflict.

2) It is possible that there may a some slight loss of precision when porting from 32 bit to 64 bit calculations. This is mainly because some FTN95 32 bit mode floating point calculations actually use hidden extended precision on the way to producing double or single precision results. It is therefore possible that the process of porting to 64 bits may expose a numerically unstable calculation (i.e. one that depends critically on the level of round-off error). In the same way, in extreme cases it is possible that new exceptions may appear at runtime due to floating point overflow. Overflow can occur directly or as the result of dividing by a value that has underflowed to zero. In some cases it is possible to resolve these issues by using a scaling factor in the calculations.

Further information about SLINK64

The SINK64 command line

SLINK64 can be used in 3 ways...

1) It can use a series of commands from a file (recommended). The commands are placed in a file with the .inf or .link suffix, and is invoked thus:

SLINK64 file.inf

2) It can be used interactively, using the same commands as in (1).

3) It can be used from the command line. This can be derived from the command specifications. Thus the command lo <obj file> can be coded on the command line as /lo <obj file>

SLINK64 commands

load(lo) <file> - Loads the file, which must be FTN95/SCC 64-bit object code.
map <file> - Requests a link map, to be placed in the specified file. If the file argument is omitted, the map is placed in a file whose name is derived from the name of the DLL or EXE file being created.
file <exe or dll file> - Completes the linking operation and puts the result in the given file name. Note that the choice of suffix (DLL or EXE) determines the type of file created. Currently all entry points in the code are exported in the case of a DLL.
windows - This command forces the creation of a WINDOWS application, which does not use a console. This is normally used in conjunction with ClearWin+ code.
load(lo) <file.dll> - Uses the entry points in the specified DLL to satisfy calls in the code. The DLL must be avaiable at run-time.
load(lo) <file.res> - Loads a resource file created with SRC using the /r switch. This is the same SRC command used in 32-bit mode except that the /r switch must be used.
image_base <hex address> - Specifies the base address for the link (not normally required, and can be overwritten at run time by Windows).
stack_size <hex number> - Specifies the stack size. The default value is 0x1000000 (16 MB).
alias <name> <alias> - Sets up an alias to an external name when making a DLL. Note that the names are case sensitive. This was added to enable gFortran to call a DLL built with FTN95. It circumvents the problem that gFortran uses lower case names while FTN95 uses upper case names! It may have other specialised uses.
help - Prints out abbreviated help information to the console.
quit(q) - Quits SLINK64 without saving anything.

Typical use

SLINK64 is automatically called when /link or /lgo is used on the FTN95 command line. The name of the executable or DLL can be supplied after /link (this is optional for executables but mandatory for DLLs). Also /stack can be included followed by the stack size as a number of megabytes. /map can also be used in this context.

The WINAPP directive in the Fortran code creates a Windows application and this directive can optionally be followed by the name of a resource script. Alternatively a resource script can be included by placing the script after the main program by using the RESOURCES directive.

Here are the required SLINK64 commands for three slightly more complicated scenarios:

1) To link a simple program that uses a DLL:

The file (say) ExtraDLL.dll is scanned for entry points but it isn't incorporated in MyProgram.exe - so MyProgram.exe will require the DLL somewhere on the path at runtime.

     lo MyProgram.obj
     lo ExtraDLL.dll
     file MyProgram.exe

2) To link a number of files to create a DLL that exports all subroutine/function names:

     lo file1.obj
     lo file2.obj
     lo file3.obj
     file MyLibrary.dll

3) To create a windows program with some ClearWin+ code that uses resources:

The resources are prepared by:

   
     SRC MyResources.rc /r

Then the slink commands are:

   
     lo MyProgram.obj
     lo MyResources.res
     windows
     file MyProgram.exe

Further general information about 64 bit FTN95

Programs compiled with FTN95 using the /64 option, use the AMD64 instruction set (subsequently adopted by Intel, and referred to as x64 or x86_64) which is almost universally available on modern PC's. This code cannot be mixed with legacy 32-bit code, nor can it access legacy 32-bit DLL's. 64-bit object files must be linked using the new utility SLINK64. This object file format is incompatible with all third-party link utilities.

The default size of INTEGER variables remains unchanged (2³¹-1), so INTEGER*8 (8-byte) variables must be used to index extremely large arrays. These variables are implemented in a more efficient and natural way in 64 bits. Note that some arrays that would not fit in the old 4GB limit may still be indexable using default sized integers, for example a REAL*8 array of 2,000,000,000 elements would occupy nearly 16GB of memory, but could be indexed using default integers.

The main value of 64-bit compilation is that the available address space has increased from 4GB to approximately 1.8 x 10¹⁹ bytes! This means that for the foreseeable future (possibly forever!), the size of programs will be limited only by the amount of physical memory available on a system.

Arrays that are ALLOCATEd, or which are in COMMON or in MODULEs can exceed the 4GB limit, except that initialised arrays must fit within the .EXE or .DLL file to which they belong, and the the size of these files cannot extend beyond the 4GB limit. This is a Microsoft limit, but is fairly reasonable, since the time needed to load a 4GB file would be excessive!

COMMON blocks and MODULE arrays are allocated dynamically as a program starts in order to enjoy no 4GB restrictions. This is applied to all such storage blocks, because a program may exceed the 4GB limit even though each individual array lies within this limit.

Local arrays (static or dynamic) are restricted as in 32 bits. This is because it is not feasible to extend the hardware stack to sizes > 4GB, and SAVE'd variables must fit within the EXE or DLL file to which they belong. Users who require a very large local array, should put it in a COMMON block or MODULE referenced by only the one routine.

Since the code can be distributed across multiple DLL's plus an EXE file, the code itself is also not limited to 4GB - although this is not usually a serious concern.

The various 64-bit Windows operating systems provide less than the full 1.8 x 10¹⁹ address space, and the size of this space varies somewhat with the available physical memory on the system. Nevertheless, these limits are very generous and will increase as physical memory becomes more plentiful. In part, these limits are due to the fact that the paging mechanism itself requires memory.

For further information see https://msdn.microsoft.com/en-us/library/aa366778.aspx.

The pair of DLL's SALFLIBC64.DLL and CLEARWIN64.DLL in 64 bits take the place of the 32 bits SALFLIBC.DLL. Currently CLEARWIN64.DLL (which contains much more than ClearWin+) is compiled with Microsoft C++. In the future this may be absorbed into SALFLIBC64.DLL but will remain independent for use with third-party compilers.

Perhaps surprisingly, FTN95.exe and SLINK64.exe are 32-bit executables, and so still require access to SALFLIBC.DLL at compile time.

Note that the extra executables and DLL's to support 64-bit mode can coexist with those that support 32-bit operations because they have different names.

Contents of a clrwin.manifest file...

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <trustInfo xmlns="urn:schemas-microsoft-com:asm.v2">
    <security>
      <requestedPrivileges>
        <requestedExecutionLevel level="asInvoker" uiAccess="false"/>
      </requestedPrivileges>
    </security>
  </trustInfo>
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type="Win32" name="Microsoft.Windows.Common-Controls" version="6.0.0.0"
                        processorArchitecture="*" publicKeyToken="6595b64144ccf1df" language="*"/>
    </dependentAssembly>
  </dependency>
</assembly>

64-bit CODE/EDOC in FTN95

The AMD 64-bit architecture

This architecture was invented by AMD, and was later adopted by by Intel when their own Itanium 64-bit architecture was not received with enthusiasm. Intel use the term x86-64. It is the basis of most modern PCs, and is targeted by FTN95 when the /64 switch is used.

The AMD 64-bit architecture has 16 general purpose integer registers:

RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8, R9, R10, R11, R12, R13, R14, R15.

The bottom eight registers correspond to the 32-bit register set, and retain some of the same functionality. Thus RSP is the stack pointer and descends as the stack expands, RCX, RSI and RDI are used for string operations just as they are in 32-bits, and RAX is used by convention to return integer function values. RBP does not correspond in function to EBP, however it is given a special function in Silverfrost code (explained later), and should not be modified in normal circumstances.

All these registers hold 64 bits (8 bytes) and can therefore hold a pointer to anywhere in the 64-bit address space.

64-bit programs can access two sets of different floating point registers - the old floating point stack of eight 80-bit registers, and a set of registers designated XMM0 - XMM15, and known as the SSE registers. These registers can hold multiple values simultaneously - foour REAL*4 floating point values, or two REAL*8 values. They can also hold integer values. Thus these registers are 16 bytes in width. These registers do not 'know' what data they contain - so it is up to the programmer to keep track. In particular, if you load a REAL*8 value into an XMM register and wish to store it as a REAL*4, you must first use the appropriate conversion instruction.

Strangely, the old coprocessor stack instructions, do offer some functionality that is not present in the newer SSE instruction set - for example SIN and COS can be evaluated in one instruction.

Silverfrost CODE/EDOC conventions

Let us start with a simple executable example of a 64-bit CODE/EDOC sequence that simply sums a vector of REAL*8 values. It is not meant to be optimal because it does not use the parallel execution facilities of the SSE registers.


      REAL*8 vec(3),ans
      DATA vec/3.0d0,4.0d0,5.0d0/
      CALL sum(vec,3,ans)
      PRINT*,ans
      END   
      
      SUBROUTINE sum(vec,n,ans)
      INTEGER n
      REAL*8 vec,ans
      CODE
       MOV_Q     RDX,=VEC   ! The '=' denotes a (non-immediate) constant or, as in this case, the address of an argument
       MOV_Q     R14,=N     ! Remember all addresses are 64-bit - hence the use of MOV_Q
       MOVSX_Q   R14,[R14]  ! Instructions and register names are case insensitive
                            ! N is only a 32-bit integer, so it is sign extended to 64 bits
       XORPD     XMM0,XMM0  ! This is one way to zeroise an XMM register it does a bitwise exclusive OR
1      ADDSD     XMM0,[RDX]
       ADD_Q     RDX,8      ! This uses an immediate constant
       DEC_Q     R14
       JNE       $1         ! Labels are denoted by a '$'
       MOV_Q     RCX,=ans     
       MOVDQU    [RCX],XMM0 ! Store away the accumulated answer in the argument ANS
      EDOC
      END

This illustrates a variety of points

1) The instructions that operate on the integer registers can operate on 1, 2, 4, or 8 byte operands. These are distinguished by a suffix, thus the MOV instruction takes the forms MOV_B, MOV_H, MOV, MOV_Q.

2) Unlike the 32-bit code/edoc, the register name does not change when the operation operates on a smaller number of bytes.

3) Operations that work on 4 bytes of a register (MOV, ADD, etc) also clear the upper 4 bytes of the register, whereas 2-byte and 1-byte instructions do not change the other bytes of the register. This is a feature of the hardware, not a Silverfrost convention.

4) Labels are prefixed by a '$' when used, just as is the case in 32-bit mode.

5) When accessing a Fortran argument, you need to first access its address (an 8-byte quantity). The notation =N is used to access the address of argument N. The '=' notation can also be used to address a constant in memory, for example:

	   
                     MOVSD    XMM3,=2.0d0

6) The MOVSX_Q instruction sign extends a 32-bit integer to 64-bits. In situations where a number is known to be non-negative. This extension can be obtained for free using point 3 above.

In general a good way to learn to write instructions inside CODE/EDOC is to compile simple code samples with the /EXPLIST option, which will display the instructions generated by the compiler line by line in essentially the same format that you will use.

Referencing COMMON, MODULE, and ALLOCATE'd variables

Because most COMMON blocks are allocated as the program starts up (as are large arrays in MODULE's) the simplest way to access these objects, as well as explicitly ALLOCATE'd arrays, is to take their address before entering the CODE/EDOC. For example:

        COMMON/FRED/alpha,beta(100),gamma
        INTEGER*8 alpha,beta,gamma
        INTEGER*8 addressof_beta
        addressof_beta=loc(beta)
        CODE
         MOV_Q    R10,addressof_beta
         MOV_Q    [R10+8],42   !This sets beta(2) to the value 42

The 64-bit address space

The 32-bit address space provided a theoretical maximum 2³² (4 x 10⁹) addressable bytes. Correspondingly, the 64-bit address space offers a theoretical maximum 2⁶⁴ (1.8 x 10¹⁹) addressable bytes. This means that, rather like in the early days of the 32-bit architecture, when a typical computer might have vastly less than 2³² bytes (4 GB) of memory, the virtual address space is only very sparsely populated.

Indeed, the 64-bit virtual address space is so large that it isn't possible to provide page tables to cover the address space. This means that the amount of virtual address space available to a program is determined in a way that depends on the version of Windows in use, and the total amount of main memory on the computer (say 16 GB). This number is still extremely large. However, it is relevant if you use calls to VirtualAlloc to access high memory addresses in an absolute way.

Using the SSE registers for parallel computation

Instructions like MOVDQA will load a pair of REAL*8 numbers into an XMM register. Since these numbers are just bits, the instruction can also be used to move four REAL*4 numbers into an XMM register. However this instruction will fault if the data is not 16-bit aligned. This is problematic because REAL*4 and REAL*8 numbers are aligned wherever possible (EQUIVALENCE can prevent alignment) to 4 and 8 bytes respectively. In practice it turns out that the MOVDQU (which is reputed to be slower than MOVDQA) seems to run at the same speed for aligned data, and only somewhat slower for non-aligned data, but generates no alignment faults.

It is also worth reading this discussion about alignment issues: http://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/