www.ShoppingPodder.com

Leading Computer Shopping,
News and information


Part of the Identityscape.com network...

getxfactor.com jmoodmusic.com smartbusinesschoices.com mintdepot.com lowfaresalways.com evangelicalview.com shoppingpodder.com soproudlywehail.com webnews.ws currenthumor.com

 

 

Memory Access Regarding Rank/Bank Combinations
   Shopping Podder - the Best of Computer Postings! Forum Index -> Computer Architecture  
View previous topic :: View next topic  
Author Message
Guest







PostPosted: Wed Nov 19, 2008 10:19 pm    Post subject: Memory Access Regarding Rank/Bank Combinations Reply with quote

Hi, Group,

Sorry for the confusing title.

My question is based on the following truth (might not true..):

Each physical memory address maps to a specific rank/bank pair. The
bandwidth to each rank/bank pair is limited. Memory access hits
multiple rank/bank pairs in parallel achieve better memory
throughput.

Are there any codes can demonstrate this effect?
Back to top
MitchAlsup
Guest






PostPosted: Thu Nov 20, 2008 2:30 am    Post subject: Re: Memory Access Regarding Rank/Bank Combinations Reply with quote

Lets at least get the terminology correct.

Each DRAM DIMM is assigned a physical address range. Each DRAM
contains at least 4 banks. Each DRAM bank can be simultaneously busy
with slight offsets wrt other banks. A DIMM can contain 1 or 2 ranks
of DRAMs.

CPU and I/O accesses are routed to the appropriate DRAM DIMM by the
physical address bits via buss or via interconnect fabric.

Each northbridge makes at least one routing decision per cycle; one
request, one responce, and one unit of data is more typical.

There can be up to 4 DRAM DIMMs on a memory controller, and there can
be several memory controllers in the system.

Access to the same memory controller arrive one at a time
Accesses to all the DRAM DIMMs on a memory controler begin one at a
time
Accesses on DRAM DIMMS on a memory controller proceede in parallel
when bank conflict do not occur

Accesses to different memory controlers can arrive simultaneously
Access to different DRAM DIMMS on different memory controllers can
begn simultaneously

The data bus connecting the DRAM DIMMs have to be electrically "turned
around" when the master on the bus is changed. {Read to Write, or DIM1
to DIM2 to DIM3}. Note write bursting across DIMMs does not have to
turn the bus arround.

You can make a model using the above assumptions and be pretty
accurate.

Applications that demonstraight utility: Vector strip mine codes--of
which memory block move (page copy) is the degenerate subset.
Back to top
davewang202
Guest






PostPosted: Thu Nov 20, 2008 8:56 am    Post subject: Re: Memory Access Regarding Rank/Bank Combinations Reply with quote

A couple of tiny nitpicks.

On Nov 19, 6:30 pm, MitchAlsup <MitchAl...@aol.com> wrote:
Quote:
Lets at least get the terminology correct.

Each DRAM DIMM is assigned a physical address range. Each DRAM
contains at least 4 banks. Each DRAM bank can be simultaneously busy
with slight offsets wrt other banks. A DIMM can contain 1 or 2 ranks
of DRAMs.

Up to 4 ranks now.

There exists quad rank FBDIMM's, DDR2 RDIMM's, and "other types of
quad rank DIMMs"

Quote:
CPU and I/O accesses are routed to the appropriate DRAM DIMM by the
physical address bits via buss or via interconnect fabric.

Each northbridge makes at least one routing decision per cycle; one
request, one responce, and one unit of data is more typical.

There can be up to 4 DRAM DIMMs on a memory controller, and there can
be several memory controllers in the system.

Access to the same memory controller arrive one at a time
Accesses to all the DRAM DIMMs on a memory controler begin one at a
time
Accesses on DRAM DIMMS on a memory controller proceede in parallel
when bank conflict do not occur

Accesses to different memory controlers can arrive simultaneously
Access to different DRAM DIMMS on different memory controllers can
begn simultaneously

The data bus connecting the DRAM DIMMs have to be electrically "turned
around" when the master on the bus is changed. {Read to Write, or DIM1
to DIM2 to DIM3}. Note write bursting across DIMMs does not have to
turn the bus arround.

Idle cycles (Turnaround times) were not needed between write bursts to
different DIMMs in DDR SDRAM DIMMs, because the DQ termination
topology is static.

Idle cycles (Turnaround times) are used between write bursts to
different DIMMs in DDR2 and DDR3 SDRAM memory systems, because you
need time to change over the termination topology to make it
appropriate to write to different DIMMs (ODT switching time).

Quote:
You can make a model using the above assumptions and be pretty
accurate.


Quote:
Applications that demonstraight utility: Vector strip mine codes--of
which memory block move (page copy) is the degenerate subset.
Back to top
Guest







PostPosted: Thu Nov 20, 2008 10:31 am    Post subject: Re: Memory Access Regarding Rank/Bank Combinations Reply with quote

On Nov 20, 3:30 am, MitchAlsup <MitchAl...@aol.com> wrote:
Quote:
Lets at least get the terminology correct.

Each DRAM DIMM is assigned a physical address range. Each DRAM
contains at least 4 banks. Each DRAM bank can be simultaneously busy
with slight offsets wrt other banks. A DIMM can contain 1 or 2 ranks
of DRAMs.

CPU and I/O accesses are routed to the appropriate DRAM DIMM by the
physical address bits via buss or via interconnect fabric.

Each northbridge makes at least one routing decision per cycle; one
request, one responce, and one unit of data is more typical.

There can be up to 4 DRAM DIMMs on a memory controller, and there can
be several memory controllers in the system.

Access to the same memory controller arrive one at a time
Accesses to all the DRAM DIMMs on a memory controler begin one at a
time
Accesses on DRAM DIMMS on a memory controller proceede in parallel
when bank conflict do not occur

Accesses to different memory controlers can arrive simultaneously
Access to different DRAM DIMMS on different memory controllers can
begn simultaneously

The data bus connecting the DRAM DIMMs have to be electrically "turned
around" when the master on the bus is changed. {Read to Write, or DIM1
to DIM2 to DIM3}. Note write bursting across DIMMs does not have to
turn the bus arround.

You can make a model using the above assumptions and be pretty
accurate.

Applications that demonstraight utility: Vector strip mine codes--of
which memory block move (page copy) is the degenerate subset.

Thanks for the reply.

I'm dealing with an interesting problem related to this.

Suppose we don't know which bit(s) of a pysical address decides the
bank(s) to be accessed. Somehow I want to "guess" those bit(s).

Here is my idea, although it doesn't seem to work...

First, make sure all the write go through the cache and hit the memory
controller. What I do here is to use Memory Type Range Registers
(MTRRs) to set the memory grabbed from the system uncacheable/write
through. As a result, all writes go to memory controller directly.

Second, for a chunk of memory, try two different memory access
patterns, as explained in the following code:

do_gettimeofday(&t0);
while (loops--) {
while (counter-- && counter > STRIDE) {
*v = 0xDeadBeef;
*(v + STRIDE) = 0xdeadbeef;
v++;
}

v = chunk;
counter = COUNT;
}
do_gettimeofday(&t1);

By using different STRIDEs, possibly, two writes in the above code
could hit two different banks. Therefore, the throughput of memory
access might somehow increase.

However, running above code with different STRIDEs, I got almost the
same throughputs :(

Could anyone give me a hint on this? What's the problem of my
assumption and idea?
Back to top
davewang202
Guest






PostPosted: Fri Nov 21, 2008 3:53 pm    Post subject: Re: Memory Access Regarding Rank/Bank Combinations Reply with quote

On Nov 20, 2:31 am, dme...@gmail.com wrote:
Quote:
On Nov 20, 3:30 am, MitchAlsup <MitchAl...@aol.com> wrote:





Lets at least get the terminology correct.

Each DRAM DIMM is assigned a physical address range. Each DRAM
contains at least 4 banks. Each DRAM bank can be simultaneously busy
with slight offsets wrt other banks. A DIMM can contain 1 or 2 ranks
of DRAMs.

CPU and I/O accesses are routed to the appropriate DRAM DIMM by the
physical address bits via buss or via interconnect fabric.

Each northbridge makes at least one routing decision per cycle; one
request, one responce, and one unit of data is more typical.

There can be up to 4 DRAM DIMMs on a memory controller, and there can
be several memory controllers in the system.

Access to the same memory controller arrive one at a time
Accesses to all the DRAM DIMMs on a memory controler begin one at a
time
Accesses on DRAM DIMMS on a memory controller proceede in parallel
when bank conflict do not occur

Accesses to different memory controlers can arrive simultaneously
Access to different DRAM DIMMS on different memory controllers can
begn simultaneously

The data bus connecting the DRAM DIMMs have to be electrically "turned
around" when the master on the bus is changed. {Read to Write, or DIM1
to DIM2 to DIM3}. Note write bursting across DIMMs does not have to
turn the bus arround.

You can make a model using the above assumptions and be pretty
accurate.

Applications that demonstraight utility: Vector strip mine codes--of
which memory block move (page copy) is the degenerate subset.

Thanks for the reply.

I'm dealing with an interesting problem related to this.

Suppose we don't know which bit(s) of a pysical address decides the
bank(s) to be accessed. Somehow I want to "guess" those bit(s).

Here is my idea, although it doesn't seem to work...

First, make sure all the write go through the cache and hit the memory
controller. What I do here is to use Memory Type Range Registers
(MTRRs) to set the memory grabbed from the system uncacheable/write
through. As a result, all writes go to memory controller directly.

Second, for a chunk of memory, try two different memory access
patterns, as explained in the following code:

        do_gettimeofday(&t0);
        while (loops--) {
                while (counter-- && counter > STRIDE) {
                        *v = 0xDeadBeef;
                        *(v + STRIDE) = 0xdeadbeef;
                        v++;
                }

                v = chunk;
                counter = COUNT;
        }
        do_gettimeofday(&t1);

By using different STRIDEs, possibly, two writes in the above code
could hit two different banks. Therefore, the throughput of memory
access might somehow increase.

However, running above code with different STRIDEs, I got almost the
same throughputs :(

Could anyone give me a hint on this? What's the problem of my
assumption and idea?- Hide quoted text -

- Show quoted text -

The problem with the basic assumption is that the physical address
bits that decodes to the bank address is static, and that you can get
any physical page, bang on it and discover which address bits decode
as the bank address bits. I don't think you can make that
presumption. Memory controllers built in the last couple of years
have incorporated the bank address XOR'ing concept to scatter the bank
address ID's around, and that will make a software-based discovery
approach rather difficult.

Here's Zhang's paper that discusses the rationale and basic concept of
the bank address scattering scheme.

http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/abs00-7.html

David
Back to top
Guest







PostPosted: Mon Nov 24, 2008 8:34 pm    Post subject: Re: Memory Access Regarding Rank/Bank Combinations Reply with quote

On Nov 21, 4:53 pm, davewang202 <davewang...@gmail.com> wrote:
Quote:
On Nov 20, 2:31 am, dme...@gmail.com wrote:





On Nov 20, 3:30 am, MitchAlsup <MitchAl...@aol.com> wrote:

Lets at least get the terminology correct.

Each DRAM DIMM is assigned a physical address range. Each DRAM
contains at least 4 banks. Each DRAM bank can be simultaneously busy
with slight offsets wrt other banks. A DIMM can contain 1 or 2 ranks
of DRAMs.

CPU and I/O accesses are routed to the appropriate DRAM DIMM by the
physical address bits via buss or via interconnect fabric.

Each northbridge makes at least one routing decision per cycle; one
request, one responce, and one unit of data is more typical.

There can be up to 4 DRAM DIMMs on a memory controller, and there can
be several memory controllers in the system.

Access to the same memory controller arrive one at a time
Accesses to all the DRAM DIMMs on a memory controler begin one at a
time
Accesses on DRAM DIMMS on a memory controller proceede in parallel
when bank conflict do not occur

Accesses to different memory controlers can arrive simultaneously
Access to different DRAM DIMMS on different memory controllers can
begn simultaneously

The data bus connecting the DRAM DIMMs have to be electrically "turned
around" when the master on the bus is changed. {Read to Write, or DIM1
to DIM2 to DIM3}. Note write bursting across DIMMs does not have to
turn the bus arround.

You can make a model using the above assumptions and be pretty
accurate.

Applications that demonstraight utility: Vector strip mine codes--of
which memory block move (page copy) is the degenerate subset.

Thanks for the reply.

I'm dealing with an interesting problem related to this.

Suppose we don't know which bit(s) of a pysical address decides the
bank(s) to be accessed. Somehow I want to "guess" those bit(s).

Here is my idea, although it doesn't seem to work...

First, make sure all the write go through the cache and hit the memory
controller. What I do here is to use Memory Type Range Registers
(MTRRs) to set the memory grabbed from the system uncacheable/write
through. As a result, all writes go to memory controller directly.

Second, for a chunk of memory, try two different memory access
patterns, as explained in the following code:

        do_gettimeofday(&t0);
        while (loops--) {
                while (counter-- && counter > STRIDE) {
                        *v = 0xDeadBeef;
                        *(v + STRIDE) = 0xdeadbeef;
                        v++;
                }

                v = chunk;
                counter = COUNT;
        }
        do_gettimeofday(&t1);

By using different STRIDEs, possibly, two writes in the above code
could hit two different banks. Therefore, the throughput of memory
access might somehow increase.

However, running above code with different STRIDEs, I got almost the
same throughputs :(

Could anyone give me a hint on this? What's the problem of my
assumption and idea?- Hide quoted text -

- Show quoted text -

The problem with the basic assumption is that the physical address
bits that decodes to the bank address is static, and that you can get
any physical page, bang on it and discover which address bits decode
as the bank address bits.  I don't think you can make that
presumption.  Memory controllers built in the last couple of years
have incorporated the bank address XOR'ing concept to scatter the bank
address ID's around, and that will make a software-based discovery
approach rather difficult.

Here's Zhang's paper that discusses the rationale and basic concept of
the bank address scattering scheme.

http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/abs00-7.html

David- Hide quoted text -

- Show quoted text -

Thanks for your reply.

I just got the idea of XOR interleaving through reading your book^^.
It seems that Sun uses this in some of its chipsets.

However, I couldn't find any Intel's documents mentioning this kind of
techniques are used in their chipset.

If it is possible, could you show me that Intel is also using this
mechanism?
Back to top
Display posts from previous:   
   Shopping Podder - the Best of Computer Postings! Forum Index -> Computer Architecture  
Page 1 of 1
All times are GMT

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum