From version 8.00 onwards FTN95 includes a 64-bit compiler as well as the long established 32-bit version. 64-bit code is produced by using the /64 compiler switch. Plato also has appropriate configuration options for enabling 64-bit code production (which eventually boil down to it using the /64 compiler option).
It is not possible to mix 32 and 64-bit code. Compilation and linking is either all 32-bit or all 64. Linking is done by the new 64-bit linker: slink64.
FTN95 creates 64-bit executables and DLLs when:
SLINK64 can be used in:
a) command line mode
b) interactive mode or
c) script file mode
Here is an example of using command line mode...
FTN95 prog.f95 /64 FTN95 sub.f95 /64 SLINK64 prog.obj sub.obj /file:prog.exe
Here is an example of using interactive mode...
SLINK64 $ lo prog.obj $ lo sub.obj $ file prog.exe
Here is an example of using a script file...
where prog.inf contains...
lo prog.obj lo sub.obj file prog.exe
For further information see below or type...
SLINK64 automatically scans commonly used Windows DLLs. If a Windows function located in (say) xxx.dll is
reported as missing then the DLL should be loaded by using a script command of the form
lo C:\Windows\Sysnative\xxx.dllwhere C:\Windows illustrates the value of the %windir% environment variable.
The 64-bit debugger is provided. It operates in essentially the same way as the corresponding 32-bit debugger.
64-bit ClearWin+ was previously available for use with third party compilers via clearwin64.dll. This DLL has now been extended for use with 64-bit FTN95. Users who have already adapted their code for use with third-party compilers can continue to use their modified code. Alternatively native FTN95/ClearWin+ code can be used without change apart from the following exceptions:
Use the command line option /r for 64-bit applications and link the resulting .res file (together with the .obj files) via SLINK64.
Current experience suggests that using the "default.manifest" in a resource script causes the resulting 64-bit application to fail to load. However, a user supplied manifest file can improve the appearance. The text of a suitable manifest file is presented below.
A RESOURCES directive can be used at the end of a 64-bit Fortran main program but it only has effect when used with FTN95 command line options /LINK or /LGO. Otherwise a separate call to SRC is required. (For Win32 main programs, FTN95 automatically adds the resources to the main object file).
Silverfrost INCLUDE files have been modified so that Microsoft HANDLEs have type INTEGER(KIND=7).
Silverfrost MOD files can be used without change provided they are updated to those in this release.
Note that user FTN95 MOD files for 64-bit applications may differ from those for 32-bit applications. So FTN95 uses the extension .mod64 for 64-bit MOD files whilst retaining teh extension .mod for 32-bit MOD files. The corresponding object files always differ and the respective linker (SLINK or SLINK64) will reject object files of the wrong kind.
By default FTN95 uses the extension .obj for both 64-bit and 32-bit object files. For projects, both Plato and Visual Studio retain the default extension and use a system of sub-folders in order to create executables for different platforms (such as Win32, x64 and .NET) and for differenct configurations (such as Debug, CheckMate and Release).
Users who prefer to build their applications using batch and/or makefiles can adopt a similar sub-folder approach to that used by Plato and Visual Studio. Alternatively, 64-bit object files can be given a different extension (e.g. .o64) by using /BINARY (together with the object file name) on the FTN95 command line. In that way, 64-bit and 32-bit object files could reside in the same folder.
The associated release of Plato is configured by default to use FTN95 when you select "Release x64" on the main toolbar. Previously this used gFortran.
Plato can be used as a debugger to step into 64 bit FTN95 code or to launch the 64-bit debugger SDBG64 as an external application. See the Settings dialog on the Tools menu..
Like salflibc.dll, salflibc64.dll and clearwin64.dll can be freely redistributed with your applications and DLLs.
1) When using the standard Fortran SIZE intrinsic, FTN95 with /64 returns a 64-bit integer despite the fact that this is not strictly standard conforming. In certain very special circumstances, this change can cause existing code to fail. For example, failure will occur if SIZE(x) appears as the value of an argument to an overloaded subprogram (i.e. a subprogram that has various definitions depending on the types of its arguments). A new command line option /SIZE_ISO. is provided in order to resolve this conflict.
2) It is possible that there may a some slight loss of precision when porting from 32-bit to 64-bit calculations. This is mainly because some FTN95 32-bit mode floating point calculations actually use hidden extended precision on the way to producing double or single precision results. It is therefore possible that the process of porting to 64-bits may expose a numerically unstable calculation (i.e. one that depends critically on the level of round-off error). In the same way, in extreme cases it is possible that new exceptions may appear at runtime due to floating point overflow. Overflow can occur directly or as the result of dividing by a value that has underflowed to zero. In some cases it is possible to resolve these issues by using a scaling factor in the calculations.
The SINK64 command line
SLINK64 can be used in 3 ways...
1) It can use a series of commands from a file (recommended). The commands are placed in a file with the .inf or .link suffix, and is invoked thus:
2) It can be used interactively, using the same commands as in (1).
3) It can be used from the command line. This can be derived from the command specifications. Thus the command lo <obj file> can be coded on the command line as /lo <obj file>
SLINK64 is automatically called when /link or /lgo is used on the FTN95 command line. The name of the executable or DLL can be supplied after /link (this is optional for executables but mandatory for DLLs). Also /stack can be included followed by the stack size as a number of megabytes. /map can also be used in this context.
The WINAPP directive in the Fortran code creates a Windows application and this directive can optionally be followed by the name of a resource script. Alternatively a resource script can be included by placing the script after the main program by using the RESOURCES directive.
Here are the required SLINK64 commands for three slightly more complicated scenarios:
1) To link a simple program that uses a DLL:
The file (say) ExtraDLL.dll is scanned for entry points but it isn't incorporated in MyProgram.exe - so MyProgram.exe will require the DLL somewhere on the path at runtime.
lo MyProgram.obj lo ExtraDLL.dll file MyProgram.exe
2) To link a number of files to create a DLL that exports all subroutine/function names:
lo file1.obj lo file2.obj lo file3.obj file MyLibrary.dll
3) To create a windows program with some ClearWin+ code that uses resources:
The resources are prepared by:
SRC MyResources.rc /rThen the slink commands are:
lo MyProgram.obj lo MyResources.res windows file MyProgram.exe
Programs compiled with FTN95 using the /64 option, use the AMD64 instruction set (subsequently adopted by Intel, and referred to as x64 or x86_64) which is almost universally available on modern PC's. This code cannot be mixed with legacy 32-bit code, nor can it access legacy 32-bit DLL's. 64-bit object files must be linked using the new utility SLINK64. This object file format is incompatible with all third-party link utilities.
The default size of INTEGER variables remains unchanged (231-1), so INTEGER*8 (8-byte) variables must be used to index extremely large arrays. These variables are implemented in a more efficient and natural way in 64-bits. Note that some arrays that would not fit in the old 4GB limit may still be indexable using default sized integers, for example a REAL*8 array of 2,000,000,000 elements would occupy nearly 16GB of memory, but could be indexed using default integers.
The main value of 64-bit compilation is that the available address space has increased from 4GB to approximately 1.8 x 1019 bytes! This means that for the foreseeable future (possibly forever!), the size of programs will be limited only by the amount of physical memory available on a system.
Arrays that are ALLOCATEd, or which are in COMMON or in MODULEs can exceed the 4GB limit, except that initialised arrays must fit within the .EXE or .DLL file to which they belong, and the the size of these files cannot extend beyond the 4GB limit. This is a Microsoft limit, but is fairly reasonable, since the time needed to load a 4GB file would be excessive!
COMMON blocks and MODULE arrays are allocated dynamically as a program starts in order to enjoy no 4GB restrictions. This is applied to all such storage blocks, because a program may exceed the 4GB limit even though each individual array lies within this limit.
Local arrays (static or dynamic) are restricted as in 32-bits. This is because it is not feasible to extend the hardware stack to sizes > 4GB, and SAVE'd variables must fit within the EXE or DLL file to which they belong. Users who require a very large local array, should put it in a COMMON block or MODULE referenced by only the one routine.
Since the code can be distributed across multiple DLL's plus an EXE file, the code itself is also not limited to 4GB - although this is not usually a serious concern.
The various 64-bit Windows operating systems provide less than the full 1.8 x 1019 address space, and the size of this space varies somewhat with the available physical memory on the system. Nevertheless, these limits are very generous and will increase as physical memory becomes more plentiful. In part, these limits are due to the fact that the paging mechanism itself requires memory.
For further information see https://msdn.microsoft.com/en-us/library/aa366778.aspx.
The pair of DLL's SALFLIBC64.DLL and CLEARWIN64.DLL in 64-bits take the place of the 32-bits SALFLIBC.DLL. Currently CLEARWIN64.DLL (which contains much more than ClearWin+) is compiled with Microsoft C++. In the future this may be absorbed into SALFLIBC64.DLL but will remain independent for use with third-party compilers.
Perhaps surprisingly, FTN95.exe and SLINK64.exe are 32-bit executables, and so still require access to SALFLIBC.DLL at compile time.
Note that the extra executables and DLL's to support 64-bit mode can coexist with those that support 32-bit operations because they have different names.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> <trustInfo xmlns="urn:schemas-microsoft-com:asm.v2"> <security> <requestedPrivileges> <requestedExecutionLevel level="asInvoker" uiAccess="false"/> </requestedPrivileges> </security> </trustInfo> <dependency> <dependentAssembly> <assemblyIdentity type="Win32" name="Microsoft.Windows.Common-Controls" version="184.108.40.206" processorArchitecture="*" publicKeyToken="6595b64144ccf1df" language="*"/> </dependentAssembly> </dependency> </assembly>
The AMD 64-bit architecture
This architecture was invented by AMD, and was later adopted by by Intel when their own Itanium 64-bit architecture was not received with enthusiasm. Intel use the term x86-64. It is the basis of most modern PCs, and is targeted by FTN95 when the /64 switch is used.
The AMD 64-bit architecture has 16 general purpose integer registers:
RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8, R9, R10, R11, R12, R13, R14, R15.
The bottom eight registers correspond to the 32-bit register set, and retain some of the same functionality. Thus RSP is the stack pointer and descends as the stack expands, RCX, RSI and RDI are used for string operations just as they are in 32-bits, and RAX is used by convention to return integer function values. RBP does not correspond in function to EBP, however it is given a special function in Silverfrost code (explained later), and should not be modified in normal circumstances.
All these registers hold 64-bits (8 bytes) and can therefore hold a pointer to anywhere in the 64-bit address space.
64-bit programs can access two sets of different floating point registers - the old floating point stack of eight 80-bit registers, and a set of registers designated XMM0 - XMM15, and known as the SSE registers. These registers can hold multiple values simultaneously - foour REAL*4 floating point values, or two REAL*8 values. They can also hold integer values. Thus these registers are 16 bytes in width. These registers do not 'know' what data they contain - so it is up to the programmer to keep track. In particular, if you load a REAL*8 value into an XMM register and wish to store it as a REAL*4, you must first use the appropriate conversion instruction.
Strangely, the old coprocessor stack instructions, do offer some functionality that is not present in the newer SSE instruction set - for example SIN and COS can be evaluated in one instruction.
Silverfrost CODE/EDOC conventions
Let us start with a simple executable example of a 64-bit CODE/EDOC sequence that simply sums a vector of REAL*8 values. It is not meant to be optimal because it does not use the parallel execution facilities of the SSE registers.
REAL*8 vec(3),ans DATA vec/3.0d0,4.0d0,5.0d0/ CALL sum(vec,3,ans) PRINT*,ans END SUBROUTINE sum(vec,n,ans) INTEGER n REAL*8 vec,ans CODE MOV_Q RDX,=VEC ! The '=' denotes a (non-immediate) constant or, as in this case, the address of an argument MOV_Q R14,=N ! Remember all addresses are 64-bit - hence the use of MOV_Q MOVSX_Q R14,[R14] ! Instructions and register names are case insensitive ! N is only a 32-bit integer, so it is sign extended to 64-bits XORPD XMM0,XMM0 ! This is one way to zeroise an XMM register it does a bitwise exclusive OR 1 ADDSD XMM0,[RDX] ADD_Q RDX,8 ! This uses an immediate constant DEC_Q R14 JNE $1 ! Labels are denoted by a '$' MOV_Q RCX,=ans MOVDQU [RCX],XMM0 ! Store away the accumulated answer in the argument ANS EDOC END
This illustrates a variety of points
1) The instructions that operate on the integer registers can operate on 1, 2, 4, or 8 byte operands. These are distinguished by a suffix, thus the MOV instruction takes the forms MOV_B, MOV_H, MOV, MOV_Q.
2) Unlike the 32-bit code/edoc, the register name does not change when the operation operates on a smaller number of bytes.
3) Operations that work on 4 bytes of a register (MOV, ADD, etc) also clear the upper 4 bytes of the register, whereas 2-byte and 1-byte instructions do not change the other bytes of the register. This is a feature of the hardware, not a Silverfrost convention.
4) Labels are prefixed by a '$' when used, just as is the case in 32-bit mode.
5) When accessing a Fortran argument, you need to first access its address (an 8-byte quantity). The notation =N is used to access the address of argument N. The '=' notation can also be used to address a constant in memory, for example:
MOVSD XMM3,=2.0d06) The MOVSX_Q instruction sign extends a 32-bit integer to 64-bits. In situations where a number is known to be non-negative. This extension can be obtained for free using point 3 above.
In general a good way to learn to write instructions inside CODE/EDOC is to compile simple code samples with the /EXPLIST option, which will display the instructions generated by the compiler line by line in essentially the same format that you will use.
Referencing COMMON, MODULE, and ALLOCATE'd variables
Because most COMMON blocks are allocated as the program starts up (as are large arrays in MODULE's) the simplest way to access these objects, as well as explicitly ALLOCATE'd arrays, is to take their address before entering the CODE/EDOC. For example:
COMMON/FRED/alpha,beta(100),gamma INTEGER*8 alpha,beta,gamma INTEGER*8 addressof_beta addressof_beta=loc(beta) CODE MOV_Q R10,addressof_beta MOV_Q [R10+8],42 !This sets beta(2) to the value 42
The 64-bit address space
The 32-bit address space provided a theoretical maximum 232 (4 x 109) addressable bytes. Correspondingly, the 64-bit address space offers a theoretical maximum 264 (1.8 x 1019) addressable bytes. This means that, rather like in the early days of the 32-bit architecture, when a typical computer might have vastly less than 232 bytes (4 GB) of memory, the virtual address space is only very sparsely populated.
Indeed, the 64-bit virtual address space is so large that it isn't possible to provide page tables to cover the address space. This means that the amount of virtual address space available to a program is determined in a way that depends on the version of Windows in use, and the total amount of main memory on the computer (say 16 GB). This number is still extremely large. However, it is relevant if you use calls to VirtualAlloc to access high memory addresses in an absolute way.
Using the SSE registers for parallel computation
Instructions like MOVDQA will load a pair of REAL*8 numbers into an XMM register. Since these numbers are just bits, the instruction can also be used to move four REAL*4 numbers into an XMM register. However this instruction will fault if the data is not 16-bit aligned. This is problematic because REAL*4 and REAL*8 numbers are aligned wherever possible (EQUIVALENCE can prevent alignment) to 4 and 8 bytes respectively. In practice it turns out that the MOVDQU (which is reputed to be slower than MOVDQA) seems to run at the same speed for aligned data, and only somewhat slower for non-aligned data, but generates no alignment faults.
It is also worth reading this discussion about alignment issues: http://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/