@loboris thank you very much! i assume the cause is dma need more then 10cycles to complete a transaction! the evidences are: 1.when receive less bytes than 32 ,the transfer is ok,the data are stored in fifo. 2.when transfer more bytes at 40Mbits/s,the spi->risr regitster bit3 will be set,where norm operation will not set,but i have no register definition. i think the fifo is overrun,dma will loss some bytes. 3.when set the DATALENGTH to 32bits, transfer 4bytes at a dma transaction, no problem appear. this will decrease the transaction times by factor 4! 4.this assumption also explain when raise spi rate,the quad io mode fail first,then then daul fail. the code will run endless in waiting dma done . am i right???