Part Number Hot Search : 
2407S ON1258 P6KE24CA KSR251G NTE2314 TL431K 405GP BYT106
Product Description
Full Text Search
 

To Download AN3042 Datasheet File

  If you can't view the Datasheet, Please click here to try to view without PDF Reader .  
 
 


  Datasheet File OCR Text:
 . . . . . . . . . .
Anchor Chips Incorporated
.
.
Application Note. . . . .
.
.
.
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Rev 0.95 12/16/98 Mike Davis Applications Engineering
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Contents
INTRODUCTION .........................................................................................................................................................3 DMA FEATURE SUMMARY........................................................................................................................................4 QUICK START ...........................................................................................................................................................5 PC ARCHITECTURE 101.............................................................................................................................................6 Which DMA Controller?.......................................................................................................................................6 Software ...............................................................................................................................................................6 Hardware.............................................................................................................................................................7 Other Important Firmware and Software Components ..........................................................................................9 THE SAMPLE CODE ...................................................................................................................................................9 PERFORMANCE ....................................................................................................................................................... 10 LIMITATIONS OF THIS SAMPLE CODE ....................................................................................................................... 10 Interrupts ........................................................................................................................................................... 10 Data To/From the Local-Side Memory................................................................................................................ 11 Sharing the AN3042 DMA Controller ................................................................................................................. 11 RECOMMENDED TOOLS ........................................................................................................................................... 15 GLOSSARY.............................................................................................................................................................. 15 REFERENCES........................................................................................................................................................... 17
The information in this document is subject to change without notice and should not be construed as a commitment by Anchor Chips Incorporated. While reasonable precautions have been taken, Anchor Chips Incorporated assumes no responsibility for any errors that may appear in this document. No part of this document may be copied or reproduced in any form or by any means without the prior written consent of Anchor Chips Incorporated. Anchor Chips products are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which the failure of the Anchor Chips product could create a situation where personal injury or death may occur. Should Buyer purchase or use Anchor Chips products for any such unintended or unauthorized application, Buyer shall indemnify and hold Anchor Chips and its officers, employees, subsidiaries, affiliates and distributors harmless against all claims, costs, damages, expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that Anchor Chips was negligent regarding the design or manufacture of the part. Product and company names herein may be the trademarks of their respective owners. Anchor Chips' customers are granted a license to use the information herein, and the associated code on Anchor's web site, only in connection with developing products that utilize Anchor chips. All other uses are prohibited. The acceptance of this document will be construed as an acceptance of the foregoing conditions. Copyright 1998, Anchor Chips Incorporated All rights reserved.
2
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Introduction
This Application Note explains Anchor-supplied sample code that demonstrates how to use the Bus Mastering DMA controller built into the AN3042 (CO-MEM Lite) chip. All components, hardware and software, focus on one objective: to move data (quickly and with minimal loading on other parts of the PC) from Host Memory to the AN3042's Shared Memory, and visa versa. This is shown by the arrow labeled TRANSFERS in Figure 1. Anchor Chips provides this sample software to help you get your AN3042 design done and into production. Since we've already done much of the system programming for you, you can concentrate on your application, not on low-level system software. Typical sustained performance for this code is 75 MB/sec for Reads from Host Memory, and 100 MB/sec for Writes to Host Memory. If you are pressed for time, you can use the code as provided, treating it as a black box, without delving into how it actually works. You can focus mostly on the code that calls doDMA, and be done with it. See the DMA Feature Summary and Quick Start. However, we think you will find it useful to understand how the AN3042 fits into the PC architecture to help diagnose problems when things go wrong. The code discussed here is part of the Visual C++ example Test3042. You can download Test3042.zip from the Anchor Chips web site at http://www.anchorchips.com/pcidev/cmemlite/download.htm . We suggest that you download and unzip Test3042 first, then peruse the code as you read this App Note. If you are using the AN3042 in an embedded system that doesn't use Windows, you may have to make changes at the device driver level, but the application-layer code may be useful unchanged. We'll cover: * Features of the DMA Controller. * How to get started quickly, ignoring most of the details. 1 * PC Architecture and the Pentium as it applies to Bus Master DMA transfers. * The functions testDMA and doDMA, components of Test3042. * The way our Device Drivers help set up DMAs. * Performance issues. * The limitations of this sample code.
When this App Note says Host CPU or Host Processor, think Pentium. Pentium means the Pentium itself and all its successors (Pentium, Pentium II and Pentium Pro). Windows uses all Pentium variants in modes which are identical for our purposes (4 KB pages, 32-bit flat model). Although Win9x and WinNT can theoretically run on older processors (386 and 486), even 486 PCs rarely have PCI buses.
1
3
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
DMA Feature Summary
Here's a summary of the features of the AN3042's DMA controller (see the p.2, pp. 39-40, and pp. 51-52 of the AN3042 Technical Reference Manual [Ref. 1]). * The DMA controller's registers can be accessed (initialized) from either the PCI or Local bus. This is sometimes referred to as PCI-initiated DMA or Local-initiated DMA respectively. In Test3042, we only show PCI-initiated DMA. Ownership bits (L and P) in the DMACTL register allow the Host side or Local side to claim the DMA controller (to prevent the other side from corrupting the DMA registers by accessing them at the same time). The DMA controller always moves data by making the AN3042 the PCI Bus Master. The DMA controller always moves data a DWORD at a time, using all 32-bits (4 byte-lanes) for each data phase. Hence, the size of a transfer can be 4, 8, 12, ...16 K bytes, never 1, 2, 3, 5, ... bytes. The AN3042's Shared Memory is either the source or destination in every DMA transfer. The other device (the destination or source) can be Host Memory, or any other PCI device that is mapped into the 2 PCI Memory Space . The maximum transfer length is 16 KB. Most data phases occur with zero wait states (depending on the bandwidth supported by the PCs chipset and host memory). TRDY# and IRDY# remain asserted, and a new datum is moved on each 33 MHz PCI clock. The DMA controller can transfer data to/from any address in the full 32-bit PCI Memory Address Space. When the AN3042 is the PCI bus master, it uses the PCI cache line commands MRM (Memory Read Multiple), MRL (Memory Read Line) and MWI (Memory Write and Invalidate) where possible to optimize throughput and minimize the load on the rest of the PC.
*
* * *
* *
* *
2
See Memory Space in the Glossary.
4
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Quick Start
Here's all you need to do to use the AN3042's DMA controller. First, there are a few things you need to do before calling doDMA : Install the driver (comemL.vxd or comemL.sys) into c:\windows\system and reboot your PC. In your program: 1. Open the driver by calling comemOpenDriverL (not shown in Listings 1 or 2). See the program file TestUtil.cpp that you downloaded with Test3042. 2. Call comemCopyBarPtrL to get a Linear pointer to the AN3042 so you can access its Op Regs (Operation Registers) from your program (see Listing 1). 3. Call comemAllocContigMemL to allocate a buffer which is contiguous in both physical and linear address space (see Listing 1). 4. Call doDMA (see Listing 1). That's all there is to it.
Then, here's what happens in doDMA (see Listing 2). For most applications, you can use doDMA unchanged. For simplicity of explanation, let's assume the direction of transfer is towards the Host Memory (the AN3042 controller will do one or more MWI [Memory Write Invalidate] transactions): source = a block of n DWORDs in AN3042 Shared Memory (previously loaded with data). destination= a buffer in Host Memory (previously allocated by a call to comemAllocContigMemL). 1. Set the DMALBASE register to point to the source, the start of the data in Shared Memory (see line 19 of doDMA in Listing 2). A value of 0 points to the zeroth DWORD is Shared Memory. 2. Set the DMASIZE register to the number of DWORDs to be transferred (line 24). 3. Set the DMAHBASE register to the physical address of the destination (line 29). 4. Clear the DMA completion bit (HINT register, bit 5) (line 33). 5. Kick off the DMA by writing the to just the low order byte of the DMACTL register (line 37). Here, we both set the direction and start the DMA. 6. Wait for the DMA complete bit to be set (HINT register, bit 5) (lines 50-67).
5
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
PC Architecture 101
Which DMA Controller?
One type of DMA in PCs is Bus Master DMA. In this App Note (and all other Anchor Chips' PCI literature) the term DMA always refers to Bus Master DMA -- DMA driven by the DMA controller in the AN3042, Bus Master because the AN3042 becomes the PCI bus master. From the perspective of a PCI device, for moves to (Writes) and moves from (Reads) Host Memory, the Target device is always the Host to PCI Bridge. This is called a host transfer. However, the Target device can be another device on the PCI bus (including AN3042s). This is called a peer-to-peer transfer. The other type of DMA in PCs called System DMA uses the DMA controller in the chipset. The Intel 8237 chip was the System DMA controller in the original IBM PCs. It was later integrated into PC chipsets, for example, the PIIX3 (PCI ISA IDE Xcelerator) part of Intel's chipsets. Every PC has at least two System DMA controllers. (see Hardware below regarding chipsets). The System DMA controller is far more limited (it can only address the lowest 1 MB of physical memory) and slower than the AN3042 DMA controller, having been designed originally for the ISA bus (a bus that might be able to move 6 MB/sec. on a good day). Moreover, the System DMA controller can only move data between the ISA bus and Host Memory, never between PCI devices.
Software
The code that we provide in Test3042 and the Device Driver has the following parts (see Figure 1):
cHigh-level Application Code dLow-level Application Code
GUI (Graphical User Interface) routines (not discussed further in this App Note).
The primary code discussed in this App Note (see Listing 1 on p. 12 and Listing 2 on p. 14).
eCO-MEM Lite Interface Library (comemLif.lib)
Major functions: 1. Allows the application to create objects of class CO-MEM Lite. This code is architected to easily support multiple AN3042 chips in one system. 2. Provides routines that interface to the Device Driver
2.
For the sample program Test3042, and are statically linked by the Visual C++ linker, producing Test3042.exe. This is indicated by the and boxes actually touching.
process when Test3042 opens the device driver. This dynamic linking occurs in the call to comemOpenDriverL in TestUtil.cpp (not shown in Listings 1 or 2).
c, d e The short line connecting e and f indicates that f is dynamically (run time) linked to the Test3042
c, d
e
2 Device Driver
A Ring 0 Device Driver is an essential part of our AN3042 support because only Ring 0 code is privileged to access the Paging Unit. Application code (layers and running at Ring 3 cannot access the Paging Unit.
c
d)
6
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
By accessing the Paging Unit, the Device Driver provides two vital services to the Ring 3 application: 1. The Device Driver converts the value in the AN3042's BAR0 (Base Address Register 0) from a Physical to a Linear address. BAR0 is initialized by the BIOS at boot time. In Physical Address Space, BAR0 points to the base of all AN3042 functions that are visible from the PCI side (operation registers and Shared Memory). The Device Driver returns the Linear address of the contents of BAR0 (Base Address Register 0) to the application so the application can directly access the AN3042. 2. When called upon (through the IOCTL_COMEM_ALLOCCONTIGMEM entry point), the Device Driver allocates 16 KB blocks (four 4 KB Pages) that become the source or destination for AN3042 DMA TRANSFERS. These 16 KB blocks must be contiguous in both Physical and Linear memory space. Contiguity can only be guaranteed by Ring 0 code, because only it has access to the Page Tables. 16 KB is the size of AN3042 Shared Memory, hence 16 KB is the maximum size of a DMA transfer. The actual Device Driver file in your system (normally installed in c:\windows\system) will be one of two types: Win 9x comemL.vxd Win NT comemL.sys As shown by the arrows which go to the host side of it is common for each software layer (c, and to directly access the AN3042 (through the Host to PCI bridge
e
f)
h,
i).
d,
Hardware
Test3042 and other host processor software accesses the hardware components by generating 3 memory accesses that are converted by the Paging Unit from Linear to Physical addresses.
g Host memory - Is in the same Physical Address Space as PCI devices (but of course at a
different location within that space).
h Chipset - These are most commonly Intel Chipsets, such as the Intel(R) 440BX AGPset and Intel(R)
440ZX AGPset. See http://www.intel.com/design/pcisets .
i Host to PCI Bridge - a part of the Chipset that interfaces the Pentium, Host Memory, etc., to the
PCI bus.
j PCI Bus - 32 bit, 33 MHz. k An AN3042 and Local-side Circuit on an add-in board plugged into the PCI Bus. l The Local Circuit. Can consist of:
* * * a complex state machine, or nearly any microprocessor a memory subsystem, which can consist of a mix of memories any application circuitry.
3
Intel never uses the terms Paging Unit per se in their documentation. They talk about all the low-level components of the paging mechanism, such as Page Tables, Page Directory, etc., but never specifically Paging Unit. We suspect the reason for this is that the paging unit is quite distributed into components on the Pentium chip, and in Page Tables in Host Memory. See the Glossary for the Anchor Chips' definition of Paging Unit.
7
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Typical Services and Entry Points at Each Layer of Software GUI (Menus, Dialog Boxes, etc.) doDMA, testDMA
cHigh-level Application Code
Ring 3
eCO-MEM Lite Interface
Library (comemLif.lib) Device Driver f(comemL.vxd or comemL.sys)
Ring 0
dLow-level Application Code
comemAllocContigMemL
I O C T L _ C O M E M _A L L O C C O N T I G M E M
Linear Addresses
"Paging Unit"
gHost Memory
Chipset
hi
Host to PCI Bridge
Physical Addresses
TRANSFERS
j PCI Bus kAN3042
Software Hardware
Local-side Addresses
16 KB Shared Memory
lLocal-side Circuit
Figure 1: PC Components Involved in Bus Master DMA Transfers Notes about Figure 1: * In general, each software layer calls services in lower layers. However, after a layer has obtained a Linear Address (a `C' pointer) to the AN3042, it is free to read or write directly to the AN3042 without `calling through' lower layers. See the three bi-directional arrows that point to the Host to PCI Bridge, which originate at and as well as Also see code in Listing 2. Statements such as line 24 g_regp->dmasize = length - 4; access the AN3042 directly.
d
e,
f.
*
All Software lives in the domain of Linear Addresses, whereas all hardware lives in the domain of Physical Addresses.
8
Anchor Chips Incorporated *
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Application programs in Win9x and WinNT use a flat model (32-bit protected mode). There are exceptions, which are buried in Windows internals and normally don't concern us. For example, Win9x still uses 16-bit protected mode for much of the code in User.exe (one of several key Windows executable files). Test3042 and all new applications that you write should be compiled into flat-model 32-bit code.
Other Important Firmware and Software Components
* The Plug and Play (PnP) BIOS (a part of the System BIOS). At boot time, during the first second after Reset is released, the PnP BIOS scans the PCI bus and assigns a unique Physical Address range (`Region') to each PCI BAR (Base Address Register). This process is called enumeration. Since the PnP BIOS is a fixed component on the motherboard, it `knows' which resources are required by devices on the motherboard (such as Host Memory), and can reserve those, assigning the remaining Physical Addresses to add-in devices. See the PCI 2.1 Specification (Ref. 3) for details about the device scan, BAR operation, etc. When Windows boots, it may decide to redo what the PnP BIOS did. For example, if Windows doesn't think that the device has a valid device driver (Windows can't find a match in the Device Driver Database), it will do the `safe' thing, disabling the device by writing zero to all its BARs. * * * * Windows - Called on by the application and the Device Driver for many services. comemL.inf - Is the major file to tell the Windows installation code how to match the AN3042 to the correct Device Driver. The Registry. The Device Driver Database, which consists all .bin files in the directory c:\windows\inf. These files are a compiled database of all the .inf files in the c:\windows\inf tree. You can delete them to force the Plug and Play system to rebuild this database during the next reboot.
The Sample Code
See Listings 1 and 2. Important code is in bold type. Since the code is heavily commented, please see the code for details. The code consists of two C++ functions: doDMA and testDMA. (If you don't know C++, don't panic. We've used very few of the C++ extensions to C.) testDMA is intended to be called by higher level code after the device driver has been opened, then testDMA calls doDMA.
9
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Performance
Typical sustained performance for this code is 75 MB/sec. for Reads from Host Memory, and 100 MB/sec. for Writes to Host Memory. By taking certain shortcuts (not shown in Listings 1 and 2), we have been able to push Write performance to over 120 MB/sec., remarkably close to the 132 MB/sec. theoretical limit of a 32-bit 33 MHz PCI bus. The overhead to set up the next transaction, and pauses during a transfer, is extremely low for this code and the AN3042. Large transfers (16 KB) are fastest because the key factor in overall speed is the time to set up the next transfer. Most of the time to set up the next transfer is consumed by transactions on the PCI bus itself, not execution of the code in the Pentium core. When you reach the optimization phase of your project, try to minimize the actual DMA setup and polling transactions on the bus. They are much slower than execution of instructions in the Pentium core. For example, a write to an AN3042 Op Reg takes about 6 PCI clocks = 180 ns. In the same time (assume a 300 MHz Pentium = 3ns instruction execution time, and DMA code running from the Pentium's instruction cache), the processor core will execute about 60 instructions. If we can eliminate one PCI transaction, we can save the time it takes to run about 60 instructions!
Limitations of This Sample Code
Interrupts
doDMA (Listing 2) polls the AN3042 for DMA completion. At the expense of considerable complexity, an interrupt-driven DMA device driver could be written. There is a possibility (though not a certainty) that overall system performance would be improved using interrupts, because interrupts would allow processor cycles that are consumed in the spin loop (waiting for DMA completion) to be used to by other threads to improve overall system responsiveness. Throughput across the PCI bus, however, would probably not be improved, because PCI throughput is mostly limited by: * DMA setup time * PCI bandwidth consumed by other PCI devices * The ability of the Target device (the Host to PCI Bridge and the Host Memory system) to source or sink the data. Newer chipsets can be more efficient than older ones, because they use a variety of techniques to sustain zero wait-state PCI bursts. * Loading on Host Memory by other agents, for example, the AGP bus, and instruction fetches that that must actually fetch from Host Memory (they missed in both Level 1 and Level 2 caches). Polling is reasonably efficient because DMAs are done quickly -- 16 KB transfers complete in about 150 sec. Interrupts may not improve performance much because of the overhead involved in interrupt processing: * saving and restoring processor state * a small amount of cache thrashing when the context is switched.
10
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Data To/From the Local-Side Memory
In some applications, it may be OK to leave the data in Shared Memory and access it from the Localside Circuit without moving to other memory on the Local Side. In most applications, however, an extra step would be required: moving the data from the Shared Memory to a Local-side memory so the Shared Memory could be used for the next load of data. The code in doDMA does nothing to prod the Local-side Circuit to move the data from 3042 Shared Memory to its own local memory. Here's a possible double-buffered scheme to move the data from Shared Memory to Local-side memory, designed to maximize sustained throughput. (For simplicity of explanation, assume data is moving from Host Memory to the Local-side memory.) The 16 KB Shared Memory is split into two halves. A single-DWORD guard band at address 0 and 0x2000 is required because of Errata #4 (Local bus Read access to the AN3042 Shared Memory will block PCI Write access to Shared Memory in some cases). 1. AN3042 DMA controller moves data from Host Memory to 1 half of Shared Memory. Simultaneously, the Local-side Circuit moves data from 2 half of Shared Memory to the Local-side Memory (data put there from the Host Memory the last nd time the PCI side `owned' the 2 half). 2. When both halves are transferred, the processors synchronize through the mailbox registers (HLDATA or LHDATA). 3. The programs on each side change buffer pointers. nd st Host points to 2 half, Local points to 1 half. 4. Both processors would then start their transfers to/from their new half. nd st Host to 2 half, Local from 1 half. AN3042 DMA can be set up and initiated by the Pentium (`Host-initiated DMA'), or by the Local circuit (`Local-initiated DMA').
nd st
Sharing the AN3042 DMA Controller
For the code in Listing 1 and 2, we've assumed that only one thread will access the DMA controller. If more than one thread on the host side wants to access the DMA Controller, or if the Local-side processor might access the DMA controller, code to perform software arbitration must be added. Ref. 5 recommends using the VDMAD (Virtual DMA Driver, a VxD) to virtualize any DMA controller. Additionally, you may want to use the L and P bits in the DMACTL register to arbitrate between Hostside and Local-side ownership of the AN3042's DMA controller.
4
To virtualize a device is to provide an intermediary VxD between the application layers and the device. (The computer science community loves the word virtual.) A VxD, a Virtual Anything (x) Driver, is then called by any thread that wants to access the device. The VxD maintains a separate state table for each thread, arbitrates for the device, and manages any conflicts that arise.
4
11
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers Listing 1: testDMA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
DWORD g_linBAR[COMEM_MAX_BARS]
// Global array which will contain the Linear BAR0 and BAR1 addresses.
// This structure MUST be designated 'volatile' for the VC++ 'Release' version (as opposed to the 'Debug' version). // If this structure is not designated 'volatile', optimizers destroy the intent of inner-loop code in doDMA below. volatile struct op_regs_struct_42_t * g_regp; // // testDMA - An example function that tests the 3042's DMA controller that: // - allocates and deallocates the Host Mem physical pages // - writes patterns to the source and destination memory // - calls doDMA to initiate the transfer // - checks the result of the 3042 DMA transfer. // DWORD testDMA(DWORD direction, DWORD comemID) { DWORD lin, phys, i, returnCode; int errorCnt = 0; printf ("\n"); if (direction == directionToHost) printf ("DMA data from 3042 Shared Mem to Host Mem...\n"); else printf ("DMA data from Host Mem to 3042 Shared Mem...\n"); // Get the Linear BAR0 and BAR1 pointers (we only use BAR0 here) so we can access the 3042's registers. // You may want to move this out of testDMA, doing it only once at the start of your application. // // Throughout this code, 'L' at the end of function names means Lite, as in CO-MEM Lite. returnCode = comemCopyBarPtrL(g_linBAR, comemID); if (returnCode != NO_ERROR) { reportErrorCode(returnCode, "creating BAR pointers."); errorCnt++; } // Setup the structure template we use to access the 3042 op regs. g_regp = (struct op_regs_struct_42_t *) (g_linBAR[0] + OP_REGS_BASE_42);
// Allocate 4 Pages (4 KB each, 16 KB total = size of 3042 Shared Mem) which are (must be!) // contiguous in Physical (and Linear) memory space. // // We pass the Physical address to the 3042's DMA controller DMAHBASE register. // We need the Linear address to reference the Pages from this program, which runs on the 'Linear side' of // the Pentium's Paging Unit. // // This allocation may be done once when the application is launched. However, here we chose to // allocate and deallocate for each invocation of testDMA. Remember to pair allocations with // deallocations. returnCode = comemAllocContigMemL (4, // In: Number of 4 KB Pages. comemAllocContigMemL can allocate 1 to 4 Pages, // but must allocate 4 for this test. &lin, // Out: Returns the Linear address of the allocated Pages. &phys, // Out: Returns the Physical address of the allocated Pages. comemID); // In: The 3042 that 'owns' the pages. if (returnCode != NO_ERROR) { printf("Error in comemAllocContigMemL."); return (++errorCnt); } DWORD * hostLin = (DWORD *) lin; DWORD * hostPhys = (DWORD *) phys; // For this test move the max block size. Start at the beginning of Shared Mem and use the maximum DMAsize. DWORD SMstart = 0; DWORD DMAsize = 0x4000; // 16 KB if (direction == directionToHost) { // Initialize source buffer (3042 Shared Mem). // // initSharedMem writes to the 3042's Shared Mem as a PCI target, which is about 10 times // slower than the DMA transfer. initSharedMem(SMstart, DMAsize, ADDR_PATTERN, comemID); // Write pattern to destination, so this test can't pass unless 3042 DMA controller actually moves the data. // printf ("Initializing destination at Linear 0x%08x pattern=0xFEEDBEEF\n", hostLin); for (i = 0 ; i < DMAsize/4 ; i++) hostLin[i] = 0xFEEDBEEF; } else
12
Anchor Chips Incorporated
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
{
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
// Initialize source buffer in Host Mem // printf ("Initializing source at Linear 0x%08x for (DWORD i = 0 ; i < DMAsize/4 ; i++) hostLin[i] = i;
DMAsize=0x%x
pattern=ADDR_PATTERN\n", hostLin, DMAsize);
// Write pattern to destination, so this test can't pass unless 3042 DMA controller actually moves the data. initSharedMem (SMstart, DMAsize, FILL_PATTERN, comemID); } // Set 3042's Latency Timer to the max (at the expense of potentially causing latency issues // for other PCI devices). // You'll definitely want to move this to an initialization routine to do it only once. PCI_CONFIG_HEADER_0 temp; returnCode = comemGetPCIInfoL(&temp, comemID); temp.LatencyTimer = 0xff; returnCode = comemSetPCIInfoL(&temp, comemID); // Set up to calculate the time and MB/sec ('bandwidth') for the DMA transfer. // Pentium high-performance timer. LARGE_INTEGER llnHPTimerFreq; // High Performance Timer: Frequency LARGE_INTEGER llnHPT1; // High Performance Timer: Time 1 LARGE_INTEGER llnHPT2; // High Performance Timer: Time 2 LARGE_INTEGER llnT_uSec; // Time in microseconds QueryPerformanceFrequency(&llnHPTimerFreq); QueryPerformanceCounter(&llnHPT1); Use the built-in
// Note: 'Host Mem' can be PCI Memory (physical) addresses other than those assigned to Host Memory. // In this example, we use Host Mem as the source and destination. But the 3042 does // support peer-to-peer DMA transfers. Get the physical addresses of the peer PCI devices from their BARs. errorCnt += doDMA ( hostPhys, // host-side physical address SMstart, // Shared Mem start (first byte of shared mem = address 0) DMAsize, // size of DMA transfer (in bytes) direction, // the direction of the transfer comemID); // which 3042 are we driving if (QueryPerformanceCounter(&llnHPT2)) { llnT_uSec.QuadPart = ( (llnHPT2.QuadPart - llnHPT1.QuadPart) * ((LONGLONG)1E9/llnHPTimerFreq.QuadPart) ) / (LONGLONG)1E3;
// Delta time in ticks. // Adjust for the frequency.
double bandwidth = (double)1E6 * (double)DMAsize / (double)llnT_uSec.LowPart; printf("DMA completed in %3d sec %03d msec llnT_uSec.LowPart/(LONG)1E6, (llnT_uSec.LowPart/(LONG)1E3) % 1000, llnT_uSec.LowPart % 1000, bandwidth ); } if (errorCnt > 0) { printf ("Error: doDMA call failed hostPhys=%08x SMstart=%08x DMAsize=%08x direction=%d comemID=%d\n", hostPhys, SMstart, DMAsize, direction, comemID); // If an error, is probably fatal, but we'll attempt to forge ahead anyway. } // Check the destination. if (direction == directionToHost) { for (i = 0 ; i < DMAsize/4 ; i++) { if (hostLin[i] != i) { if (errorCnt++ < 8) // Report first 8 errors only. printf ("Error: PCIAddr=%08x LocalAddr=%08x Wrote=%08x Read=%08x\n", hostLin+i, i, i, hostLin[i]); } } if (errorCnt > 0) printf ("%d errors total\n", errorCnt); } else { errorCnt += checkSharedMem (SMstart, DMAsize, ADDR_PATTERN, comemID); } // Deallocate the 4 Pages we allocated above. comemDeAllocContigMemL(&lin, &phys, comemID); return(errorCnt); // end testDMA// %03d usec: DMA Rate = %11.2le bytes/sec\n",
}
13
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers Listing 2: doDMA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
// doDMA // // Transfer up to 16KB via the 3042's DMA controller. // // Although in this example, we transfer between Host Mem and the 3042's Shared Mem, we could // use doDMA *unchanged* to do peer-to-peer transfers between devices on the PCI bus. // DWORD doDMA (DWORD * PCIphysMem, // The PCI (physical) Memory space address (as opposed to I/O or Config space addresses). // The 3042 needs this because it knows nothing (directly) about Linear addresses. DWORD SMstart, // Shared Mem start. Is 0 to point to the zeroth DWORD of 3042 Shared Mem. DWORD length, // Length in bytes. Gets converted to DWORDs below. DWORD direction, // To or From Host Mem. DWORD comemID) // Which 3042 we're driving. { DWORD errorCnt = 0; // Set the starting Shared Mem address in DMALBASE register. g_regp->dmalbase = SMstart; // Give it a byte address, but since DMALBASE bits 0 and 1 are dead, // this loads a DWORD address. The L in DMALBASE means Local, as in the // Local-side interface. // Set the DMASIZE. g_regp->dmasize = length - 4;
// Give it a byte size, but since DMASIZE bits 0 and 1 are dead, // actually loads a DWORD size. Also, the value written to the DMASIZE register // must be reduced by one DWORD (4 bytes).
// Set the DMAHBASE to the PCI (physical) Memory address g_regp->dmahbase = (DWORD) PCIphysMem; // Once again, we give it a byte address, but because bits 0 and 1 are dead // loads a DWORD address. // Clear the DMA complete bit. g_regp->hint &= DMA_COMPLETE_BIT; // Write low-order byte of DMACTL to set the direction, and to KICK OFF the DMA. // The only operations that MUST be done on each DMA transfer are the kick off and the wait for DMA_COMPLETE_BIT. g_regp->dmactl_v.dmactl_bytes[0] = (unsigned char) direction; // // // // Now the 3042 DMA masters the PCI bus and moves the data. Looking at the PCI bus, you'd see mostly bursts of data moved by the DMA, with occasional polls of HINT. The length of the bursts is determined by the value in the 3042's Latency Timer, and other PCI bus activity (including our HINT polling).
// Wait for the DMA complete bit to be set. // // We also want to check for a timeout here, because we don't want our thread to hang with no notification // if something goes wrong. const DWORD loopsPerByte = 0x100; // Number of times through this 'while' loop per byte of DMA. int timeout = length * loopsPerByte; // Give lots of time to complete. while ( ((g_regp->hint & DMA_COMPLETE_BIT) != DMA_COMPLETE_BIT) && timeout ) { // Reduce the frequency of HINT polling to improve performance, but poll often enough // that a poll occurs soon after the end of the DMA transfer (ideally just after the end). // // The optimal delay to be inserted here is dependent on: // - your processor speed // - the length of DMA transfer // - characteristics of the PCI bus arbiter, the Latency Timer setting, and other PCI bus traffic // - etc., etc. // A loop count of 150 works well for the 16 KB blocks, 200 MHz Pentium Pro and 440FX Chipset used to tune this example. for (volatile DWORD i = 0 ; i < 150 ; i++) ; timeout--; } if (timeout <= 0) { errorCnt++; printf ("Error: timed out before DMA_COMPLETE_BIT set. errorCnt=%d\n", errorCnt); } return (errorCnt); // end doDMA
}
14
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
Recommended Tools
* * Microsoft Visual C++ 5.x. We are not presently supporting VC++ 6.x. We do not intend to support older versions of VC++. SoftICE by Compuware(R) NuMega at www.numega.com . We've found this tool useful, especially for the following commands: * pci - dumps Config Regs for all PCI devices. Helps you find out if the system even sees your card (do the BARs have reasonable values?, is Bus Mastering enabled?). Many other uses. * peekd, poked - reads and writes to PCI devices (Physical addresses). This is a good way to bypass software that is under development and talk directly to the AN3042. * phys - displays all Linear Addresses for a given Physical Address. Is a good sanity check on the Linear Addresses returned by routines such as comemCopyBarPtrL. A logic analyzer to look at PCI bus transactions, preferably with PCI transaction formatting software. We currently use an HP 1660C with FuturePlus FSPCI64E support hardware and software. Although there is some elegance in dedicated PCI analyzers (VMetro, HP and others), we have found the HP 1660C and FuturePlus cost effective and functionally adequate. A reasonable library. See References.
*
*
Glossary
Linear address - addresses on the processor (hence program) side of the Paging Unit. See References 2, 5 and 6. Also known as User Space or Virtual Address. For the converse, see Physical address. Memory Space - A transaction on the PCI bus can access one of three independent address spaces: * Configuration (Config) Space - accesses the Config Regs in any device * Memory Space - used for most transactions after the system is configured. Supports bursting. * I/O Space - provided only for backward compatibility with x86 processor I/O space. Because most PCI devices do not support bursting for transactions in I/O Space, it should be used only where absolutely necessary - to support legacy devices. Legacy devices (such as VGA cards) perform many operations using registers mapped into I/O space. Note that it is possible for a Memory Space transaction to not access Host Memory. A Memory Space transaction can access Host Memory, or can access any other memory or register on the PCI bus that is mapped via the BARs into Memory Space. A Master tells the Target which Space it wishes to access via the Command emitted by the bus master during the Address Phase (see p. 21 of Reference 3), and by the LS bit (the `space indicator' bit) of each BAR (0 = Memory, 1 = I/O). See Physical Address.
Paging Unit - The collection of Pentium components that perform Linear to Physical and Physical to Linear address translation. The `Paging Unit' consists of: * the Page Directory * Page Tables * assorted bits in Pentium Control Registers (CRs) that determine various paging modes * Windows kernel code that manages the assorted tables and bits. The CRs reside in the core of the Pentium. The Tables reside in Host Memory, but the most recently used entries are cached in the core of the Pentium (in the TLBs - Translation Lookaside Buffers). 15
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
The Paging Unit also handles: * Demand Paging, which allows code and data to be swapped into and out of Host Memory from a disk swap file as needed (on demand). See Chapter 13 of Reference 6. * Protection, which allows a Process to own a set of pages, prohibiting other code from accessing those pages. Pages are often locked down by the Paging Unit, so they do not move in Physical Space or get swapped to disk. For example, pages associated with PCI devices and the 16 KB buffers allocated by comemL.vxd or comemL.sys get locked down. Physical address - addresses as seen by hardware, on the hardware side of the Paging Unit. For our purposes this hardware is all PCI devices and host memory. On the PCI bus, a Physical Address is identical to a PCI Memory Space Address (as opposed to an I/O Space Address or a Configuration Space Address). For the converse, see Linear address.
Process - a program (application) that is protected in its own memory space. Normally when you launch an application program, Windows creates a new Process for that program. The new Process may only access memory in its own memory space, or in the operating system through calls to the Win32 API (Application Program Interface). Rings - Access and instruction execution privilege (protection) levels controlled by Pentium processors. * Ring 0 - Most privileged protection level on a Pentium. Also known as System Level or Supervisor Level. Code running at this level (such as comemL.vxd) can execute any instruction and access any memory or I/O location, including the page tables. Ring 3 - Least privileged protection level on a Pentium. Also known as User Mode or Application Level. Ring 1 and 2 - though built into Pentium processors, Ring 1 and 2 are not used by Windows operating systems, or likely by any other operating system.
* *
Thread - a single uninterrupted flow of execution through a program. A Process owns one or more Threads. When created, a Process owns only one Thread. This parent thread may then explicitly create ("spawn") more Threads as it runs by calling OS services. The Process would then appear to be doing more than one thing at a time. Another key component in this slight of hand is the Windows Scheduler. The Scheduler time slices between all the active Threads in the system. It gets invoked periodically when timer or other interrupts occur. For example, you could write a device driver that would invoke the scheduler whenever the AN3042 DMA completion interrupt fires. In general, the Scheduler selects the highest priority Thread from the current list of ready-to-run Threads, from any Process.
16
Anchor Chips Incorporated
Programming AN3042 (CO-MEM Lite) Bus Master DMA Transfers
References
1. CO-MEM Lite, AN3042 Integrated Circuit, Technical Reference Manual. Anchor Chips Incorporated. 2. Intel Architecture Software Developer's Manual Volume 3: System Programming Guide. Intel Corporation, 1997. The Intel Architecture Software Developer's Manual consists of three books and three addenda, all available at http://developer.intel.com/design/pentium/manuals : Volume 1: Basic Architecture Order Number 243190 Volume 2: Instruction Set Reference Manual Order Number 243191 Volume 3: System Programming Guide Order Number 243192 Addendum: Volume 1: Basic Architecture Order Number 243691 Addendum: Volume 2: Instruction Set Reference Manual Order Number 243689 Addendum: Volume 3: System Programming Guide Order Number 243690. 3. PCI Local Bus Specification, Revision 2.1. PCI Special Interest Group, Portland, OR. 4. Shanley, Tom, Anderson, Don. PCI System Architecture, Third Edition. MindShare, Inc., 1995. Addison-Wesley. 5. Hazzah, Karen. Writing Windows VxDs and Device Drivers. R&D Books, Lawrence, KS, 1997. 6. Shanley, Tom. Protected Mode Software Architecture. MindShare, Inc., 1996. Addison-Wesley.
17


▲Up To Search▲   

 
Price & Availability of AN3042

All Rights Reserved © IC-ON-LINE 2003 - 2022  

[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy]
Mirror Sites :  [www.datasheet.hk]   [www.maxim4u.com]  [www.ic-on-line.cn] [www.ic-on-line.com] [www.ic-on-line.net] [www.alldatasheet.com.cn] [www.gdcy.com]  [www.gdcy.net]


 . . . . .
  We use cookies to deliver the best possible web experience and assist with our advertising efforts. By continuing to use this site, you consent to the use of cookies. For more information on cookies, please take a look at our Privacy Policy. X