Appendix E. In-line Assembly

The LynPlexS and LynPlexC compilers both support in-line assembly.

In-line assembly is not available in the LynPlex interpreter.

Why is it desirable to be able to use assembly code within the program?

  • To run ASM instructions that are not available to the LynPlexS compiler
  • To modify the assembly code generated by the LynPlexS compiler
  • To optimize the assembly code generated by the LynPlexS compiler
  • To learn assembly language programming

The compilers currently only produce code for Intel 80×86 based machines, however, in the future, they might well be ported to a platform that uses a different instruction set. Asm blocks should therefore only be used when necessary.

The use of Asm code in a program is an advanced feature and should not be used unless the programmer is familiar with Asm programming. The topic of assembly language programming is beyond the scope of this reference manual.

The Asm Block

Description

An Asm block is used to insert specific machine-code instructions in a program in order to perform operations that cannot be carried out using the features of the language or to hand-optimize performance-sensitive sections of code.

Syntax

asm
architecture-dependent instructions
end asm
or
asm architecture-dependent instruction

LynPlexC

Syntax
Asm block comments have the same syntax as usual LynPlex comments and not “ ; ” as is usual in 80×86 assembler.

The syntax of the in-line assembler is a simplified form of Intel syntax. Intel syntax is used by the majority of x86 assemblers, such as MASM, TASM, NASM, YASM and FASM.

In general, the destination of an instruction is placed first, followed by the source. Variables and functions defined by a program may be referenced in an Asm block. The assembler used by LynPlexC is GAS, using the .intel_syntax noprefix directive, and Asm blocks are passed through unmodified, except for the substitution of local variable names for stack frame references, and comment removal.

Instruction syntax is mostly the same as FASM. One important difference is that GAS requires size settings to be followed by the word “ptr”.

// Assuming "blah" is a LynPlex global or local UINTEGER variable 
mov eax,[blah]          // Fine, size obvious 
inc [blah]              // Bad, size not specified 
inc dword [blah]        // All said, but GAS still won't accept this 
inc dword ptr [blah]    // GAS needs "ptr" here 

The return value of a function may be set by using the Function keyword within brackets as shown in the example below.

Example

// This is an example for the x86 architecture.
Function AddFive(ByVal num As Integer) As Integer
  Asm
    mov eax, [num]
    add eax, 5
    mov [Function], eax
    End Asm
  End Function

Dim ix As Integer = 4

Print "4 + 5 = ":AddFive(ix)    //4 + 5 = 9

LynPlexC uses AS/GAS, the GCC assembler. As this is an external program, some quirks apply:

The error lines returned by LynPlexC for Asm blocks are not related to the LynPlex source file. As LynPlexC simply displays the errors returned by AS, the lines are related to the assembly file. To make LynPlexC preserve the assembly file, the compiler must be invoked with the -R option (“don't delete ASM files”).

Label names are case sensitive inside Asm blocks.

LynPlexS

Example

Function double Pi()
// example of loading a 80-bit extended precision real
// and returning it as a DOUBLE
  asm
    fldpi               ; load pi into st(0)
    jmp   end.Pi.fldpi  ; return with pi in st(0)
    end asm
    end function

In-line assembly code can only be used within a declared function.

To have a function return an integer value, place the value into the eax register before returning. Functions whose return type require more than 4 bytes would need to return the address of the value or string in eax.

function SetBit(bit, x)
  asm
    mov   ecx,[ebp+0x08] ; bit must go here for or
    mov   ebx,[ebp+0x0C] ; put x into ebx
    mov   eax,0x01       ; put 1 into eax
    sll   eax,cl         ; shift arith left bit times = 2^bit
    or    eax,ebx        ; inclusive OR x with result
    jmp   end.SetBit.setbit  ; jump to end of function
    end asm
  end function

The LynPlexS compiler generates assembly code which is compatible with the GoAsm assembler.

An HTML Help manual for GoAsm can be found in the \LynPlexS\manual folder. See GoAsm.chm.

For further resources concerning assembly language programming, see the next section.

80x86 Specific Code

Register Preservation
The registers ebx, esi, and edi are generally required to be preserved by most or all OS's using the x86 CPU. For this reason, when an Asm block is opened, these registers are pushed to the stack and when the block is closed, they are restored. You can therefore use these registers without explicitly saving them.

You should not change esp and ebp, since they are usually used to address local variables.

Register Names
The names of the registers for the x86 architecture are written as follows in an Asm block:

4-byte integer registers: eax, ebx, ecx, edx, ebp, esp, edi, esi
2-byte integer registers: ax, bx, cx, dx, bp, sp, di, si (low words of 4-byte e- registers)
1-byte integer registers: al, ah, bl, bh, cl, ch, dl, dh (low and high bytes of 2-byte -x registers)
Floating-point registers: st(0), st(1), st(2), st(3), st(4), st(5), st(6), st(7)
MMX registers (aliased onto floating-point registers): mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7
SSE registers: xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7

Unsafe instructions
Note that the LynPlexC compiler produces 32-bit protected-mode code for the x86 which usually runs in an unprivileged user level. This means that privileged and sensitive instructions will assemble, but probably will not work correctly or cause a runtime “General Protection Fault”, “Illegal instruction”, or SIGILL error.

The following are the privileged and sensitive instructions as of the Intel Pentium 4 and Xeon:

cli *1
clts
hlt
in *1
ins *1
int *1
into *1
invd
invlpg
lgdt
lidt
lldt
lmsw
ltr
mov to/from CRn, DRn, TRn
out *1
outs *1
rdmsr
rdpmc *2
rdtsc *2
sti *1
str
wbinvd
wrmsr
all SSE2 and higher instructions *2

*1: sensitive to IOPL, fine in DOS
*2: sensitive to permission bits in CR4, see below

The privileged instructions will work “correctly” in DOS when running on a Ring 0 DPMI kernel, like the (non-default) Ring 0 version of CWSDPMI, WDOSX or D3X, nevertheless most of them are not really useful and dangerous when executed from DPMI code.

RDTSC (Read Time Stamp Counter) has been shown to be allowed by most, or all OS'es. However the usefulness of RDTSC has been diminished with the advent of multi-core and hibernating CPUs. SSE2 and higher instructions are disabled “by default” after CPU initialization, Windows and Linux usually do enable them, in DOS it is business of the DPMI host: HDPMI32 will enable them, CWSDPMI won't.

The INT instruction is usable in the DOS version/target only, note that it works slightly differently from real mode DOS.

The segment registers (cs, ds, es, fs, gs) should not be changed from an Asm block, except in certain cases with the DOS port (note that they do NOT work the same way as in real-mode DOS). The operating system or DPMI host is responsible for memory management; the meaning of segments (selectors) in protected mode is very different from real-mode memory addressing.

Note that those “unsafe” instructions are not guaranteed to raise a “visible” crash even when ran with insufficient privilege - the OS or DPMI host can decide to “emulate” them, either functionally (reading from some CRx works under HDPMI32), or “dummy” (nothing happens, instruction will pass silently, like a NOP).

Assembly Language Programming Resources


lynplex/lp0e.txt · Last modified: 2012/09/08 15:27 (external edit)