Department of Energy Argonne National Laboratory Office of Science NEWTON's Homepage NEWTON's Homepage
NEWTON, Ask A Scientist!
NEWTON Home Page NEWTON Teachers Visit Our Archives Ask A Question How To Ask A Question Question of the Week Our Expert Scientists Volunteer at NEWTON! Frequently Asked Questions Referencing NEWTON About NEWTON About Ask A Scientist Education At Argonne Reading Cache Memory
Name: Kamal D.
Status: Other
Age: 20s
Location: N/A
Country: N/A
Date: Around 2001


Question:
How can we read and store data from cache memory of processor?

-Is this thing is possible in java or in VB-6.0 through win32 api



Replies:
If you read and store any data at all, it will, at some point be read and stored by the cache memory of the processor. That is what the cache memory is there for--to automatically increase the execution speed of your program by temporarily caching the contents of main memory.

If you wish to use the cache as "extra memory" for your data, this is not how the cache works, and not what it was designed for. It is designed to bridge the gap between extremely fast processors and slow main memory or RAM. The cache controller copies areas of main memory that it thinks are going to be needed to the cache memory, which the processor can then access at high speeds. If something isn't in the cache, then the processor has to wait while it is retrieved from main memory.

As far as directly controlling the cache, there are not generally instructions in the processor for doing this. Instead, you should research how your particular cache controller functions, and how it expects memory to be accessed. By insuring your program follows these same patterns, the cache controller will perform optimally with your program.

It is possible to optimize the execution speed of a program by arranging your code for optimal use of the cache, but as I will attempt to explain below, it is generally a waste of time, unless you simply want to know. If this is the case, check out Intel's site, here are some pages:
http://developer.intel.com/design/chipsets/applnots/memory.htm
http://developer.intel.com/design/intarch/techinfo/430TX/TX_intro.htm
http://developer.intel.com/design/intarch/techinfo/Pentium/instsum.htm

For Java, this is a pointless exercise. In order to optimize for the cache you need to know exactly the type of processor the program will run on, not just that it will run on Windows, and you also need control over every instruction executed and every piece of data that the processor accesses. With Java, you generally won't know which processor the code is going to run on (it is intended to be a cross-platform language), and you will also not have direct control over every instruction executed. This is because, the Java virtual machine intreprets every byte code, and your program never goes directly to the processor, ever.

VB 6.0 is a little better in that is is compiled, and at least the executable code does go directly to the processor. But, VB 6.0 is designed to hide all the messy details of exactly which instructions and memory addresses the processor is using. This makes it very hard to optimize for the cache. A good first, step, however is to make your code as small as possible and to use arrays that are small and access each piece of data arrays in a loop. After the first iteration of the loop, the cache should have fetched everything. As long as your loop is small and makes no function calls, and your array is small (< 8k) the cache should work pretty well.

Optimizing the cache through some kind of API would be a pretty pointless thing. As soon as you jump to the Windows kernel, a different section of memory would be pulled into the cache, and then when the call returned, your program would be loaded back in. This would waste a great deal of time, and lose most of the optimization you were hoping for.

Also, optimizing for the cache generally only yields a small performance increase overall. For an especially time critical section of code, it can be useful, especially if the code is small. This is generally best done in Assembly language with the use of a processor specific performance analyzer like VTune for the Intel processors.

If your code does really need to be faster, looking at the cache should be one of the final steps in the process. In general the process would be to:

1. Don't use slow frameworks or third party libraries.

2. Use a better, faster algorithm.

3. Use a faster language. Java and VB are great, but they are not designed for speed. C would be a much better choice. Just switching to C would improve performance in a big way.

4. Eliminate extra instructions inside of loops.

5. Make you loops as small as possible.

6. Process more that one set of data for each iteration of the loop.

7. If you have done all these and it still isn't fast enough, consider switching to assembly language.

8. Write the best assembly you can, and then analyze it with a processor specific performance analyzer.

9. Learn all the intricate details of your processor and optimize for all the features, data alignment, cache, pipelining, et.c

If speed is critical, the first step you should take is to find out what is taking the time. It is no good optimizing Java code if the Java Virtual machine is taking 50% of your execution time. Likewise it isn't any good optimizing a VB program if the visual basic runtime libraries are taking most of the time. Even using C/C++, you need to be careful. MFC (which is Microsoft's visual extensions for C/C++) can take up to 70% of your program's execution time.

So first, you need to verify what is taking the time and make sure it is your program. If it isn't your program, then you have to either settle for what you have, or change your environment. You may need to switch languages, switch compilers, or write your own, faster routines instead of using the default libraries.

Then, the most important thing you can control is the algorigthm you use. You should do a good deal of research and find the best way to do whatever it is that needs to be done. For example, it doesn't matter how much you optimize a sequential search, there are better ways of performing searches that will, on average, be faster, even if your sequential search is optimized in 100% Assembly language and the other search is in unoptimized Java. The largest and best performance gains are not found in coding close to the metal so to speak, but in coming up with a better way to solve the problem, or in some cases, solving a different problem that is close enough to the original problem that it can be used instead.

This probably wasn't the answer you were looking for, but lots of luck anyway

--Eric Tolman


Click here to return to the Computer Science Archives

NEWTON is an electronic community for Science, Math, and Computer Science K-12 Educators, sponsored and operated by Argonne National Laboratory's Educational Programs, Andrew Skipor, Ph.D., Head of Educational Programs.

For assistance with NEWTON contact a System Operator (help@newton.dep.anl.gov), or at Argonne's Educational Programs

NEWTON AND ASK A SCIENTIST
Educational Programs
Building 360
9700 S. Cass Ave.
Argonne, Illinois
60439-4845, USA
Update: June 2012
Weclome To Newton

Argonne National Laboratory