image
image
user-login
Patent search/

A Digital Signal Processing (DSP) Processor Optimized For 64-Bit Operations

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

A Digital Signal Processing (DSP) Processor Optimized For 64-Bit Operations

ORDINARY APPLICATION

Published

date

Filed on 13 November 2024

Abstract

The present invention is related to a 64-bit DSP processor (100) optimized for digital signal processing, featuring enhanced speed and memory efficiency. The processor integrates a high-speed Block Random Access Memory (BRAM) (110) for program and data storage, minimizing delays and supporting large data blocks. Utilizing Distributed Arithmetic (DA) in its Arithmetic and Logic Unit (ALU) (120), it accelerates essential multiplication operations. Built on a Harvard architecture, the processor separates program and data memory to improve throughput. The design includes a controller unit (140), status register (150), and program memory with general-purpose registers (GPRs) (130) for flexible, precise instruction handling. Independent BRAM read and write ports allow simultaneous data access, while a positive clock-edge operation and a two-cycle delay ensure synchronized processing. Additionally, configurable BRAM port widths and an imm_sel bit in the instruction register enhance adaptability for various DSP applications.

Patent Information

Application ID202441087529
Invention FieldCOMPUTER SCIENCE
Date of Application13/11/2024
Publication Number47/2024

Inventors

NameAddressCountryNationality
Dr M. BharathiAssistant Professor, Department of ECE, School of Engineering, Mohan Babu University (Erstwhile SreeVidyanikethan Engineering College), A. Rangampet, Tirupati-517102, INDIAIndiaIndia
Dr Yasha Jyothi M ShirurProfessor and Head, Dept. of ECE B. N. M. Institute of Technology, Bengaluru- 560070, INDIAIndiaIndia
Dr. N. PadmajaProfessor, Department of ECE, School of Engineering, Mohan Babu University ( Erstwhile SreeVidyanikethan Engineering College ), A. Rangampet, Tirupati-517102, INDIAIndiaIndia
Dr Krithikaa MohanarangamAssistant Professor, Department of Electronics and Telecomunication Engineering, Symbiosis Institute of Technology,Pune Campus, Symbiosis International (Deemed University), Pune ,IndiaIndiaIndia

Applicants

NameAddressCountryNationality
Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College)IPR Cell, Mohan Babu University (Erstwhile Sree Vidyanikethan Engineering College), Tirupati, Andhra Pradesh, India - 517102IndiaIndia

Specification

Description:Figure 1 illustrates the structural components of 64-bit digital signal processing (DSP) processor (100), designed to accelerate real-time data processing and enhance performance in DSP applications by implementing a strategic combination of advanced memory structures, arithmetic capabilities, and robust control mechanisms. By integrating a specialized Block Random Access Memory (BRAM), an optimized arithmetic and logic unit (ALU) with distributed arithmetic, and additional supportive components, the processor is structured to meet the demands of DSP applications, which often involve handling large datasets and executing complex mathematical operations.
At the core of the processor's memory architecture is BRAM (110), which serves both as program and data storage. Unlike standard RAM, BRAM allows for high-speed, low-latency read and write operations, ideal for applications where rapid data access is crucial. The BRAM supports large blocks of data, minimizing processing delays that would typically arise from memory constraints. This memory setup includes independent read and write ports, which enable simultaneous access to data without creating bottlenecks, effectively doubling data throughput and reducing operation time. This feature is especially advantageous in high-speed DSP applications where data flow and accessibility are key to maintaining processing efficiency and accuracy.
The Arithmetic and Logic Unit (ALU) (120) is engineered with Distributed Arithmetic (DA) to enhance its speed and efficiency in handling intensive arithmetic operations, such as multiplications, which are common in DSP tasks. DA enables the processor to break down complex operations into manageable steps, reducing the amount of hardware required for multiplication and allowing for faster and more efficient processing. To handle sequential tasks effectively, the ALU temporarily stores computational results in a dedicated data memory before transferring them to the general-purpose registers (GPRs). This approach prevents GPRs from becoming overloaded and ensures that data is processed and stored efficiently. By offloading intermediate results to data memory, the processor minimizes disruptions and maintains a steady flow of operations, optimizing both speed and resource allocation.
The processor also features essential control units, including a controller unit (140) and a status register (150), which manage the overall execution of instructions and control flow. The controller decodes opcodes and directs operations based on the instructions provided, ensuring that each task is completed in the proper sequence. The status register, in conjunction with the program counter (PC), enables advanced control functionalities like branching and jump instructions, which are crucial for handling diverse program sequences and making decisions based on data conditions. This control configuration allows the processor to execute a wide variety of instructions with precision and fluidity, making it adaptable to different DSP scenarios and processing demands. To further refine timing and synchronization, the status register incorporates a two-cycle delay for each operation. This delay standardizes the timing between steps, allowing sequential tasks to execute with precise intervals, reducing the potential for timing mismatches that could slow down operations or affect output accuracy. The processor is also designed to operate on the positive clock edge, ensuring that each instruction is executed at a consistent rate, which enhances the timing, accuracy, and performance of the processor. This design is especially useful in DSP applications that require real-time data processing, where timing is a critical factor in maintaining data integrity and operation flow.
A unique feature of the processor is its built-in test bench code embedded in the BRAM program memory. This predefined operational sequence facilitates testing and verification, allowing for comprehensive assessment of the processor's functionality under simulated conditions before actual use. This test code provides a structured framework to validate each component's performance, identify any potential issues, and confirm that the processor meets the required specifications. This pre-testing phase is critical for DSP applications, where processors are expected to handle intensive data loads with high reliability.
The processor's integration of BRAM, a DA-optimized ALU, and an advanced control system positions it as a powerful tool for digital signal processing. This combination of high-speed memory access, efficient computation, precise timing, and robust control mechanisms makes it well-suited for complex DSP tasks in real-time environments, such as telecommunications, audio and video processing, and scientific computations. The processor's architecture and functionalities provide a versatile and scalable solution to meet the evolving needs of DSP applications.
Figure 2illustrates the instruction register (IR) in the block RAM.
The terminology of the IR can be explained as
• opcode: The opcode represents the operation to be performed. This represents the register into which the value is to be stored after the operations are performed.src1: The first input is taken from this register.
• src2: The second input is taken from this register.
• Reg mode (0): This represents the register mode when imm_sel is 0.
• Imm mode (1): This represents the immediate mode when imm_sel is 1.
• Imm_sel: Immediate select pin. The immediate select pin decides if the input is to be tak- en from the two registers or the values directly provided by the user.
The opcode occupies the first five bits in the instruction register, defining the operation to be performed on the inputs. Instructions are formatted and loaded from a .COE file into the program memory. The destination register (dst) specifies where outputs will be stored, while src1 and src2 hold the initial values for operations. The instruction register also has an imm_sel bit at position 16, allowing selection between immediate data and register-based data (0-15 bit space) for manipulation. In general, processors operate by executing software instructions stored in memory. These instructions are fetched, decoded, and executed sequentially. Additionally, memory stores data accessible to the processor, including Block RAM (BRAM) and Block ROM, which can be used for data storage and retrieval. Program instructions access, modify, and store data in memory, with a controlled sequence of execution. This control can be determined by the instruction order or through branch, conditional branch, and subroutine calls, which influence the control flow. An instruction set enables the processor to carry out specific tasks, with an instruction decoder interpreting these commands and generating control signals to execute the operations as defined.
Table.1 - Instruction Set
INSTRUCTION OPCODE
MOV 00000
ADD 00001
SUB 00010
MUL 00011
AND 00010
OR 00101
XOR 00110
NAND 00111
NOR 01000
XNOR 01001
NOT 01010

LOAD 01011
STORE_IMM 01100
STORE_REG 01101
JUMP 01110
BONC 01111
BONZ 10000
BONN 10001
BONV 10010
BONNC 10011
BONNZ 10100
BONNN 10101
BONNV 10110
HALT 10111

Figure XX (4.7) illustrates the architecture of 64-bit DSP Processor using DA.`add: begin If(`imm_sel==0'b1) GPR[`rdst] = GPR[`src1] + GPR[`src2]; Else GPR[`rsdt] = GPR[`src1] + IR [15:0];imm_sel==0'b1)
GPR[`rdst] = GPR[`src1] + GPR[`src2]; Else GPR[`rsdt] = GPR[`src1] + IR [15:0];
The above sequence of instructions explains the storing of the value in the destination register in the GPRs. If the immediate select pin in the instruction register is bit 1, then the value from source register 1 and the value from the source register 2 are added together and stored in the specified destination register.
If the immediate select pin in the instruction register is bit 0, then the value from the source register 1 and the 16-bit immediate value given by the user is taken and added. This is further added and stored in the specified destination register. The proposed processor consists of a Block RAM, Block ROM, general purpose registers (GPRs), instruction register (IR) program counter (PC), status register, Arithmetic, and logic unit (ALU). It makes use of Harvard architecture for its implementation. The instruction register provides the instructions for the pro- gram. These instructions are loaded into a coefficient (.COE) file which is further dumped into the program memory.
The processor operates according to a test bench code loaded into its Block RAM (BRAM) program memory, structured based on the instruction register format.

The sequence of operations is as follows:

1. When the enable pin of the program memory is 1, the operation starts.
2. Now, when enrom =1, the addra [5:0] pushes the next instruction into the doutn [63:0] in the program memory.
3. The instructions are provided in the test bench code as to what operations are to be performed along with the sequence in which they are to be executed.
4. The IR either takes data from the general-purpose registers or it can take immediate data pro- vided. This depends on the immediate selection of the instruction register.
5. Taking the case of the controller and the status register, the controller takes the information from the instruction register and functions as per the opcode given in the instruction.
6. If the opcode was to either store or move data into a register, the controller directly accesses the GPRs and the operation is performed.
7. If the op code indicated performing operations on the input provided, then the controller takes the values from the GPRs and sends them to the arithmetic and logic unit for performing operations on the provided inputs.
8. The ALU performs the required operations and temporarily stores the data in the data memory. The above steps simply imply.
9. Arithmetic and Logic Unit (ALU) access the Data from General Purpose Registers (GPRs), executes the instructed operation. Stores the output in BRAM for Data Memory (Which acts as a temporary storage) which permanently stores the data in GPRs.
10. The output can be seen from the RAM by enabling the read enable pin after writing the out- put into the RAM using the write enable pin.
11. For permanent storage of the output into the registers, the doutp of the RAM sends the data to the GPRs and stores the obtained output in the specified destination register.
12. An important point to take note of is that while the operations are being performed in the ALU, simultaneously the next input is taken from the test bench code (.COE file) and the execution of instructions continues.
13. When the program encounters a jump or branch instruction the program counter takes over and enables smooth execution of the instructions.
14. In case of the proposed processor there is usage of lr and blr registers for performing branching and jump operations.

15. The lr and blr registers consist of the value to where the program is to jump to execute a cer- tain instruction.
16. The value of the PC is stored and jumps to the indicated value to continue execution. After the execution is complete PC returns to the original position and continues executing the pro- gram until it encounters another lr or blr value.
17. The controller makes use of the status register indicating the state which it is in.
18. There is a presence of delay by default during the functioning of the processor.
19. This status register, to ensure that the delays are not further enhanced introduces a two- cycle delay after every operation and accordingly adjusts it.
20. There is also the presence of a clock which synchronizes the execution of the processor and executes every operation on the positive edge of the clock in this processor.
21. After every two-cycle delay is executed, the next input is pushed into the instruction register and further execution based on the opcode occurs. The PC gets incremented after every operation.
22. Implementing this processor has proved to overcome the problems of speed and delay.
Figure 3 illustrates the schematic and Data Path Flow of 64-bit DSP Processor. The operation of the 64-bit DSP Processor is divided into several essential phases that streamline its functionality and performance. First, Program Memory Initialization is conducted, where the processor utilizes Block RAM (BRAM) to load the test bench code. This code governs the behavior and sequence of operations, setting the foundation for efficient processing.
The next phase is Instruction Fetch, where instructions are retrieved from program memory based on the current value of the program counter (PC). These instructions dictate specific actions, such as accessing data from registers, assigning values, or performing arithmetic and logic operations. Each instruction forms a step in the broader sequence of the DSP processor's operations.
Following instruction fetch, the Controller Function takes over. The controller interprets the instructions and manages execution flow by retrieving data from general-purpose registers (GPRs) and directing operations to the Arithmetic and Logic Unit (ALU) as specified. This controller ensures that the right data is sent to the right units at the right time, keeping the processor running efficiently.
At the ALU Execution stage, the Arithmetic and Logic Unit (ALU) performs specified arithmetic and logical operations on data stored in the registers. The results of these operations are temporarily stored in data memory, allowing for seamless access in subsequent steps. This temporary storage supports continuous processing and minimizes delays.
The Data Memory Access phase involves using data memory as a buffer for processed results. This setup is crucial because it allows the processor to maintain a high throughput by keeping intermediate data readily available for ongoing operations. Afterward, in the Output Retrieval phase, processed data is extracted from either data memory or registers, depending on the operation requirements, and is prepared for further use or final output.
Program Flow Control is managed by branching and jumping instructions, which determine the order of execution within the processor. Specialized registers, such as lr and blr, are employed to direct the program flow, enabling efficient execution of complex processing tasks. This phase allows the processor to handle conditional operations and loops smoothly.
Status Register Management monitors the processor's condition and introduces necessary delays to ensure proper timing and synchronization. This component is essential in fine-tuning operations, especially in systems where precise timing is critical.
Finally, Clock Synchronization uses a clock signal to coordinate all activities within the processor, typically initiating actions on the positive edge. This synchronization ensures that each component of the processor works in harmony, resulting in consistent and predictable operation timing.
By following this structured approach, the processor enhances performance, with a focus on reducing latency and improving precision. The architecture leverages specialized hardware, such as a 64-bit ALU, to boost processing power, making this DSP processor highly efficient and effective for digital signal processing applications.
Figure 4 illustrates the method for optimizing digital signal processing in a 64-bit processor. The invention describes an optimized method for digital signal processing in a 64-bit DSP processor, where data storage, arithmetic operations, and memory architecture are designed to maximize processing speed and efficiency in DSP applications. This approach leverages a series of steps that work in tandem to enhance the performance of DSP tasks, which typically require fast, precise handling of large volumes of data. The method begins with storing data in Block RAM (BRAM), which is specifically chosen for its high-speed access and efficient memory utilization capabilities. Unlike traditional memory, BRAM can quickly store and retrieve large data blocks, which is crucial in DSP operations where minimizing processing delays is essential. This setup not only accelerates data retrieval but also optimizes memory use by eliminating the need for frequent memory swaps, thereby allowing the processor to manage and access large datasets with minimal latency.
To handle computationally intensive tasks such as multiplications, the method incorporates Distributed Arithmetic (DA) within an Arithmetic and Logic Unit (ALU). DA breaks down
complex multiplication operations into manageable steps, significantly reducing the processing time. This is particularly effective in DSP applications, where multiplication operations are common. By using DA, the ALU can perform these tasks more efficiently, boosting the overall processing speed and reducing the hardware complexity required for mathematical calculations, making the processor faster and more resource-efficient. In order to enhance data throughput and minimize operational delays, the method employs a Harvard architecture. This architecture separates the program memory and data memory, allowing the processor to access instructions and data simultaneously rather than sequentially. In traditional architectures, shared memory access can lead to bottlenecks, but the Harvard architecture effectively mitigates this by keeping program and data pathways independent. This separation of memory paths is key to optimizing data flow, ensuring that the processor can fetch instructions and data concurrently, leading to faster and more efficient DSP processing. The method also includes a controller and status register that work together to manage the flow of operations within the processor. The controller coordinates instruction execution, while the status register monitors and maintains the state of the processor. Together, they facilitate advanced control features, such as branching and jump instructions, which are managed by the Program Counter (PC). The PC plays a critical role in control flow by determining the sequence of operations and ensuring that the processor can handle complex instruction sequences, such as branching, with precision. This coordination enhances the processor's ability to execute diverse tasks seamlessly, a requirement for dynamic DSP applications.
To ensure synchronization and precise timing for all operations, the method utilizes a clock signal with a two-cycle delay. This delay is introduced intentionally to synchronize each step of the process, providing a stable timing mechanism for sequential tasks. The two-cycle delay standardizes the intervals between operations, allowing the processor to execute each instruction with precision. This timing control is particularly beneficial in DSP applications, where processing accuracy and consistency are critical for reliable output.
In addition to these steps, the method includes a feature that allows the BRAM to be configured with adaptable port widths, providing flexibility in memory design. This adaptability enables the processor to adjust its memory configuration based on specific DSP application requirements. By allowing varied port widths, the BRAM can be tailored to meet different data handling needs, optimizing memory usage for particular tasks and ensuring the processor remains versatile across various DSP scenarios.
Finally, the method enhances flexibility and accuracy in data handling by equipping the instruction register with an imm_sel bit, which allows the processor to choose between register-based data and immediate data. This bit is a control feature that provides options for where the processor retrieves its data-either from existing registers or as immediate values-based on the needs of the current instruction. The imm_sel bit ensures that the processor can handle both pre-stored and real-time data inputs, enhancing its precision and flexibility, which are crucial in applications requiring nuanced data manipulation. This method collectively improves the speed, flexibility, and efficiency of a 64-bit DSP processor, making it particularly effective for handling the demands of digital signal processing applications. Through this combination of optimized memory, arithmetic execution, control flow management, and adaptable configurations, the processor is well-equipped to provide high-performance results in complex DSP environments.
, Claims:We claim
1. A digital signal processing (DSP) processor (100) optimized for 64-bit operations:
a) a Block Random Access Memory (BRAM) (110) configured for program and data storage, wherein the BRAM facilitates high-speed read and write operations to minimize processing delays and support large data blocks;
b) an arithmetic and logic unit (ALU) (120) configured to execute distributed arithmetic (DA) for efficient multiplication operations, enhancing processing speed in DSP applications;
c) a program memory and general-purpose registers (GPRs) (130) with instruction registers capable of loading and executing instructions, wherein the instruction register manages immediate and register-based data retrieval for operations; and
d) a controller unit (140) and status register (150) to coordinate instruction execution and manage control flow, with integrated branch and jump instructions utilizing program counter (PC) values.
2. The DSP processor, as claimed in claim 1, wherein the BRAM includes independent read and write ports for simultaneous data access, thereby increasing the data handling capacity and further reducing operation time.
3. The DSP processor, as claimed in claim 1, wherein the ALU is configured to temporarily store computation results in a data memory unit before final storage in the general-purpose registers (GPRs), ensuring efficient data handling for sequential operations.
4. The DSP processor, as claimed in claim 1, further comprising a status register configured to introduce a two-cycle delay for each operation, allowing synchronization of sequential steps without excessive delay.
5. The DSP processor, as claimed in claim 1, wherein the processor operates on a positive clock edge for each instruction execution step, optimizing the timing and performance of the processor.
6. The DSP processor, as claimed in claim 1, further comprising test bench code within the BRAM program memory, which predefines the operation sequence for testing and verifying processor functionality.
7. A method for optimizing digital signal processing in a 64-bit processor, comprising the steps of:
a) storing data in Block RAM (BRAM) for high-speed access and efficient memory utilization;
b) executing distributed arithmetic (DA) within an arithmetic and logic unit (ALU) to accelerate multiplication operations and reduce processing time for DSP applications;
c) utilizing a Harvard architecture to separate program and data memory, thus minimizing delays and optimizing data throughput;
d) coordinating operations through a controller and status register, including the management of branching and jump instructions with a program counter (PC) for control flow; and
e) executing operations with a clock signal that synchronizes each step and includes a two-cycle delay for precise timing.
8. The method as claimed in claim 2, further comprising the step of configuring the BRAM to allow adaptable port widths, enabling flexible memory configuration based on DSP application requirements.
9. The method as claimed in claim 2, wherein the instruction register includes an imm_sel bit for selecting between register-based and immediate data for operations, thereby enhancing flexibility and precision in instruction processing.

Documents

NameDate
202441087529-COMPLETE SPECIFICATION [13-11-2024(online)].pdf13/11/2024
202441087529-DECLARATION OF INVENTORSHIP (FORM 5) [13-11-2024(online)].pdf13/11/2024
202441087529-DRAWINGS [13-11-2024(online)].pdf13/11/2024
202441087529-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [13-11-2024(online)].pdf13/11/2024
202441087529-FORM 1 [13-11-2024(online)].pdf13/11/2024
202441087529-FORM FOR SMALL ENTITY [13-11-2024(online)].pdf13/11/2024
202441087529-FORM FOR SMALL ENTITY(FORM-28) [13-11-2024(online)].pdf13/11/2024
202441087529-FORM-9 [13-11-2024(online)].pdf13/11/2024
202441087529-REQUEST FOR EARLY PUBLICATION(FORM-9) [13-11-2024(online)].pdf13/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.