Consult an Expert
Trademark
Design Registration
Consult an Expert
Trademark
Copyright
Patent
Infringement
Design Registration
More
Consult an Expert
Consult an Expert
Trademark
Design Registration
Login
MAPPING SCHEME FOR NON-VOLATILE IN-MEMORY COMPUTING OPTIMIZATION
Extensive patent search conducted by a registered patent agent
Patent search done by experts in under 48hrs
₹999
₹399
Abstract
Information
Inventors
Applicants
Specification
Documents
ORDINARY APPLICATION
Published
Filed on 9 November 2024
Abstract
The present disclosure presents a method (100c) for optimizing in-memory computation (IMC) within a memory array that integrates storage and computation within the same memory array, optimizing energy efficiency, latency, and area usage. Leveraging spin-transfer torque magnetic tunnel junctions (STT-MTJs) paired with CMOS technology, the IMC system performs Boolean arithmetic and other computational tasks directly within the array, effectively reducing data transfer energy. Two mapping techniques are proposed for an LF-PP adder design utilizing majority logic, with configurations that support both high-density and high-speed operations. The design minimizes logical depth, the number of gates, and memory cell requirements, leading to a compact memory array footprint while maintaining adaptability for diverse applications. The method (100c) enables parallel row and column computations within the memory array, effectively reducing the utilized area while enhancing performance, and energy efficiency, making it suitable for arithmetic-intensive tasks in energy-conscious, high-performance computing environments.
Patent Information
Application ID | 202441086517 |
Invention Field | COMPUTER SCIENCE |
Date of Application | 09/11/2024 |
Publication Number | 46/2024 |
Inventors
Name | Address | Country | Nationality |
---|---|---|---|
ALLA SRIJA | Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India. | India | India |
VINOD KUMAR JOSHI | Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India. | India | India |
SOMASHEKARA BHAT | Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India. | India | India |
Applicants
Name | Address | Country | Nationality |
---|---|---|---|
Manipal Academy of Higher Education | Madhav Nagar, Manipal, 576104, Karnataka, India. | India | India |
Specification
Description:TECHNICAL FIELD
[0001] The present disclosure generally relates to in-memory computing (IMC) and memory array design, focusing on efficient memory-based computations. More particularly, it pertains to bi-directional processing and mapping techniques to reduce energy consumption, optimize area use, and minimize latency.
BACKGROUND
[0002] As CMOS technology approaches its theoretical miniaturization limits, the escalating power dissipation and limitations of processor-centric architectures, such as von Neumann systems, hinder further performance gains. These systems face significant "power" and "memory" walls due to constant data movement between the processor and memory, resulting in bottlenecks and inefficiencies. To address these challenges, the field has shifted toward In-Memory Computing (IMC), a novel approach that executes logic and arithmetic operations within the memory array itself. This advancement minimizes data transfer and, as a result, significantly enhances both computational speed and energy efficiency.
[0003] Non-volatile memory technologies like Resistive RAM (RRAM) and Magnetic RAM (MRAM) offer promising IMC solutions by allowing multi-level logic operations within the memory array. These emerging technologies reduce static power consumption while enabling arithmetic tasks-addition, multiplication, and more-directly in memory. Advanced designs, such as RRAM-based parallel prefix adders utilizing majority gates, have demonstrated latency improvements over traditional ripple-carry adders. Although effective, these methods often lead to increased memory array area requirements, driving the need for optimization.
[0004] The present disclosure proposes a refined in-memory mapping method that leverages bi-directional computing to simultaneously perform row and column operations, optimizing energy efficiency, area, and latency. By reducing the number of required write operations, this solution improves area utilization and meets the latency demands of data-intensive applications, overcoming the limitations of conventional architectures. The proposed approach offers application-specific flexibility while maintaining low power consumption, setting a new standard in memory array design by minimizing data movement and enabling efficient in-memory processing.
[0005] Therefore, there is a need to overcome the above-mentioned problems by bringing a solution that provides a thorough and effective approach that establishes a new standard in memory array technology. Its capacity to optimize energy consumption, enhance area utilization, streamline mapping processes, and offer application-specific flexibility-while maintaining low latency-sets it apart from conventional memory array designs.
OBJECTS OF THE PRESENT DISCLOSURE
[0006] Some of the objects of the present disclosure, which at least one embodiment herein satisfy are as listed herein below.
[0007] A general object of the present disclosure is to introduce bi-directional computing within memory arrays, allowing for simultaneous row-wise and column-wise operations that significantly improve performance, reduce area, and energy consumption by minimizing number of write operations.
[0008] Another object of the present disclosure is to implement two distinct mapping techniques (M.T-1 and M.T-2) that cater to different application needs-either minimizing the processing area used or optimizing for high performance-thus enabling flexibility in design and area conservation, especially in compact or power-sensitive devices.
[0009] Another object of the present disclosure is to decrease the number of write operations by employing a memory mapping approach that either avoids overwriting existing data (M.T-1) or enables selective overwriting (M.T-2).
[0010] Another object of the present disclosure is to provide a generalized mapping strategy that simplifies design processes and supports scalability across various arithmetic and logic functions, ensuring adaptability to diverse computing needs without compromising performance or memory endurance.
SUMMARY
[0011] Various aspects of the present disclosure relate to in-memory computing (IMC) and memory array design, focusing on efficient memory-based computations. More particularly, it pertains to bi-directional processing and mapping techniques to reduce energy consumption, optimize area use, and minimize latency.
[0012] An aspect of this disclosure addresses a method for optimizing in-memory computation (IMC) within a memory array, involving binary addition and initial carry-in operations executed with majority gates in optimized mapping configurations (achieving objectives O1 and O3) to reduce memory area. The method introduces two mapping techniques: M.T-1, which utilizes an optimal processing area to store intermediate computations without overwriting data, and M.T-2, a 4×n processing area that allows overwriting, maximizing space efficiency. The memory initializes to a logic "0" state, with inputs assigned as per constraints (C2 and C3). Simultaneous row and column computations enhance parallelism, where M.T-1 preserves intermediate data, and M.T-2 optimizes space with overwriting. Additionally, a majority-based Ladner-Fischer Parallel-Prefix (LF-PP) adder is implemented, minimizing gates and computational levels, allowing efficient in-memory operation. Performance is evaluated with Cadence Virtuoso and an STT-MTJ model at 1.1 V, further underscoring area and energy optimization.
[0013] Various objects, features, aspects, and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which numerals represent like components.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
[0014] The accompanying drawings are included to provide a further understanding of the present disclosure and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. The diagrams are for illustration only, which thus is not a limitation of the present disclosure.
[0015] In the figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label with a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
[0016] FIG. 1 illustrates an exemplary architecture with (a) objectives and (b) constraints, with examples demonstrating the mapping process. (c) a detailed flow chart of mapping steps to perform the in-memory implementation of an LF-PP adder design, in accordance with an embodiment of the present disclosure.
[0017] FIG. 2 illustrates an exemplary architecture demonstrating (a) the cycle of the in-memory implementation of a majority gate, performed as a READ operation via a sense amplifier, which accurately measures the effective resistance of memory cells within a 3x3 memory array given specific inputs, and (b) a row-column computation verified through transient simulation, in accordance with an embodiment of the present disclosure.
[0018] FIG. 3 illustrates an exemplary architecture showing (a) an existing design, where optimization is achieved by implementing the LF-PP adder with fewer majority gates, as demonstrated in (b) the proposed design, with regions of gate reduction highlighted by circles for clarity; (c) a depiction of the four-bit optimized LF-PP adder comprising six levels of majority (M) and NOT gates; and (d) simulation-based verification of the functionality for the four-bit LF-PP adder, assuming an initial carry of 0, in accordance with an embodiment of the present disclosure.
[0019] FIG. 4 illustrates an exemplary architecture of the proposed mapping techniques for the 4-bit LF-PP adder across all computation levels within the memory array, where all majority gates at each level are executed in parallel, in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
[0020] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
[0021] The present disclosure relates to comprehensive solutions to major challenges in memory array design, offering substantial advancements over traditional methods. One critical improvement is in energy consumption, where the present disclosure significantly reduces energy use by optimizing data placement and reuse within memory cells. This approach minimizes the frequency of write operations, which are typically energy-intensive, thereby enhancing overall energy efficiency. Additionally, area efficiency is improved through the innovative mapping techniques that store and process data more compactly, reducing the number of memory cells required. This efficient use of space is essential for developing smaller, more compact devices without compromising functionality.
[0022] Furthermore, the present disclosure enhances memory cell endurance, a persistent issue in non-volatile memory systems where frequent write operations can lead to cell degradation over time. By minimizing unnecessary data replacements, the design preserves cell longevity and optimizes the efficiency of write operations, ensuring sustained performance. Another significant advancement is in mapping complexity and application flexibility. Conventional memory architectures often struggle to adapt scalable and versatile mapping logic for a range of applications. Our adaptable mapping methodology supports different adder designs and logic functions, enabling customization based on application needs, whether for minimized area usage or high-speed performance.
[0023] FIG. 1 illustrates an exemplary architecture with (a) objectives and (b) constraints, with examples demonstrating the mapping process. (c) a detailed flow chart of mapping steps to perform the in-memory implementation of an LF-PP adder design, in accordance with an embodiment of the present disclosure.
[0024] In an embodiment, referring to FIG. 1 (a), the primary objective 100a of the proposed disclosure may be configured to introduce a fully non-volatile in-memory mapping technique, capable of supporting a broad spectrum of arithmetic and logic functions within memory arrays. To demonstrate its capabilities, the mapping methodology has been applied to a LF-PP adder, leveraging the majority gate operation as a fundamental logical primitive. The proposed IMC mapping technique can be designed with optimization goals aimed at achieving substantial improvements over conventional memory mapping approaches.
[0025] Specifically, the mapping process of a majority-based parallel prefix adder (PP adder) onto the memory array is structured as an optimization problem that addresses multiple critical objectives under defined constraints. The essential objectives for the non-volatile IMC mapping process are threefold: (1) reduction of area utilized in computation (Objective O1), (2) optimization of latency, or the required computation cycles, within the adder (Objective O2), and (3) minimization of energy consumption, strongly influenced by the number of write operations (Objective O3).
[0026] In an exemplary embodiment, by systematically minimizing both the area and energy overheads, the present disclosure can achieve significant reductions in overall energy consumption, area, and latency, surpassing the benchmarks established in existing methods and prior literature. Key contributors to energy consumption include cell state changes during write operations and read operations executed for the majority logic functions. Therefore, reducing energy usage for addition is achieved by closely monitoring and reducing write operations, fulfilling Objective O3.
[0027] Furthermore, minimizing the number of majority operations directly contributes to reduced area and write operation requirements, thereby optimizing both area and energy consumption (Objectives O1 and O3). In essence, design-level refinements applied to the existing majority-based LF adder, as implemented within an IMC environment, provide enhanced performance across all optimization metrics. Through the approach, the present disclosure can present a high-efficiency, application-flexible IMC mapping technique that addresses critical limitations of prior memory-based logic computation methods.
[0028] In an exemplary embodiment, the optimization of the proposed mapping technique is governed by a set of design constraints essential for ensuring efficient, reliable, and scalable in-memory logic operations within the memory array. The constraints 100b as shown in FIG. 1 (b), integral to the non-volatile in-memory mapping methodology, can include a Row and Column Processing Constraint (C1) that can restrict each row or column to a single majority logic operation per computational level, ensuring that concurrent operations do not occur within the same row or column. The constraints 100b is critical, as each row and column are connected to a single sense amplifier (SA), which only supports one computation at a time to prevent output conflicts and potential errors.
[0029] A Write Operation Constraint (C2) mandates that write operations may be limited to the column direction, consistent with advanced design standards, and allows for a maximum of three cells to be written within a column simultaneously. This limitation on the number of cells written in a single column effectively manages power consumption and heat generation, contributing to the stability and longevity of the non-volatile memory cells.
[0030] Furthermore, a Consecutive Location Computation Constraint (C3) specifies that for three-input majority operations, the inputs must be placed in consecutive row or column locations. This arrangement minimizes decoding complexity while aligning with peripheral circuitry, facilitating efficient in-memory computation. Additionally, this constraint enables simultaneous processing in rows and columns, reducing data movement and enhancing computational efficiency. Together, the constraints 100b can ensure the mapping technique is optimized for energy efficiency, effective area utilization, and compatibility with memory array architecture, supporting a streamlined and high-performance design suitable for executing complex logic and arithmetic functions in-memory.
[0031] In an exemplary embodiment, the methodology for mapping within the memory array may be designed to meet specific objectives 100a, such as minimizing area usage, energy consumption, and latency, while adhering to operational constraints 100b. The present disclosure can introduce two distinct mapping techniques-M.T-1 and M.T-2-that enable flexible adaptation based on application-specific needs. M.T-1 may be designed to prevent overwriting cells in successive computations by utilizing different cells to store intermediate results, reducing write operations, and enhancing cell endurance. M.T-2, on the other hand, can permit overwriting of cells at various computational levels, allowing for more compact area usage and higher processing density, which is particularly beneficial in applications where high-density, scalable operations are essential. For cases where frequent rewriting optimizes the mapping area, such as in multi-level design computations, M.T-2 is preferable. Conversely, M.T-1 is more suited for high-speed applications, as it minimizes unnecessary data replacement, thereby preserving the endurance of memory cells.
[0032] In an exemplary embodiment, using the majority-based LF-PP adder as a case study, the mapping process can become an optimization task. This entails balancing objectives 100a-such as reduced area, lower energy usage, and optimized latency-against the established constraints of 100b on row-column processing and write operations. The process flow 100c for implementing the n-bit majority-based LF-PP in-memory adder may be depicted in FIG. 1, illustrating both mapping techniques while addressing each objective and constraint step-by-step. The approach can ensure that the mapping is efficient, scalable, and adaptable to diverse computational requirements, providing a robust solution for high-performance, non-volatile in-memory computing.
[0033] In an exemplary embodiment, referring to FIG. 1 (c), a flow diagram depicting the proposed method 100c of mapping steps to perform the in-memory implementation of an LF-PP adder design. At step 102, the method 100c includes adding two binary numbers, A and B, alongside an initial carry-in value, Cin. Here, A is represented as A= anan−1…a0, where each ai denotes a bit in the binary number, with an being the most significant bit (MSB) and a0 being the least significant bit (LSB). Similarly, B is represented as B=bnbn−1…b0, indicating its own sequence of bits with the same significance hierarchy. The inclusion of Cin serves as an initial carry input, essential for handling scenarios where previous additions have produced a carry that must be incorporated into the current computation.
[0034] Continuing further, at step 104 the method 100c includes optimizing a LF-PP adder design that employs majority gates as a fundamental logic primitive targets optimization by reducing the gate count, thereby minimizing area and, computational complexity (O1) and decreasing energy consumption associated with write operations (O3). The optimization further includes configuring the LF-PP adder to utilize majority gates in a manner that enables parallel execution of operations, thereby enhancing the overall efficiency of the memory array utilized for in-memory computing. The incorporation of majority gates facilitates a compact and efficient design, which minimizes the physical area required for implementation while concurrently lowering power dissipation during addition operations.
[0035] Continuing further, at step 106, method 100c includes a mapping area designated explicitly for storing intermediate computation results during the addition of two binary numbers, where two distinct mapping techniques may be employed: M.T-1 Mapping Area, the processing area can dynamically determine based on the optimization criteria established for the computation, enabling the identification of one of the most effective mappings available. This flexibility can allow for enhanced efficiency in terms of both space utilization and operational effectiveness during the computation process. M.T-2 Mapping Area: the processing area may be configured as a fixed 4×n matrix, where n corresponds to the bit-width of the numbers being processed. The predetermined dimensions of this matrix facilitate a standardized approach to managing intermediate results while ensuring sufficient space allocation for computations, thereby contributing to the overall efficiency of the in-memory computing architecture.
[0036] Continuing further, at step 108, method 100c includes initializing the memory array to a default logic state, specifically to a logic "0". In this initialization phase, the memory cells within the array are set to a parallel state, where each cell consistently holds the value of "0". The uniform initialization can serve to establish a baseline condition for subsequent computational processes, ensuring that any prior states or data do not influence the current operation. By maintaining the memory cells in this parallel logic state, the approach allows for selective writing of "1" only where required, taking advantage of the pre-existing "0" state. This invention facilitates accurate computation of the binary addition process, thereby enhancing the reliability and integrity of the results obtained during subsequent arithmetic operations.
[0037] Continuing further, at step 110, method 100c includes storing the requisite initial inputs within the processing array, where specific memory cells are assigned a logic state of "1". The assigning/writing process may be conducted in accordance with constraints C2 and C3, which dictate the restrictions on the number of rows in a column that can be accessed simultaneously for write operations and computational arrangements. The initial inputs, corresponding to the binary numbers to be added, are strategically placed in predetermined cells to ensure optimal mapping efficiency. The optimized mapping may be achieved by analyzing the architecture of the memory array and selecting cell locations that minimize energy consumption and area utilization while adhering to the operational constraints.
[0038] Continuing further, at step 112, method 100c can include reading the majority of outputs from the current computational level within the memory array. The operation may be executed by employing both row-wise and column-wise computation methodologies, thereby enhancing the parallel processing capabilities of the system. Adhering to constraints C1 and C3, which govern the execution of majority logic operations in distinct rows and columns by placing majority inputs in consecutive locations, the simultaneous reading process can ensure that only one majority operation is conducted per row or column at any given computational level to prevent output conflicts.
[0039] Continuing further, at step 114, method 100c can write the outputs generated from the current computation level to predetermined locations within the memory array, thereby ensuring the accuracy and availability of inputs for subsequent computational levels. This writing operation is conducted under constraints C1, C2, and C3, while optimizing objectives O1, O2, and O3.
[0040] In the case of M.T-1 mapping, the outputs are written to designated memory locations without overwriting any existing data, thereby preserving data integrity across multiple computation levels. Conversely, for M.T-2 mapping, the outputs are written by overwriting the existing data in the specified locations, which facilitates efficient space utilization within the memory array. The dual approach to output writing is critical for maintaining operational continuity and enhancing the performance of the majority-based adder design while concurrently adhering to the established design constraints and optimization objectives. Read and write operations are performed continuously until all computational levels are completed.
[0041] To assess the performance of the proposed mapping techniques, simulations were conducted using the Cadence Virtuoso System Design Platform (version IC6.1.7-64b.500.19). All simulations were executed utilizing the 45 nm CMOS generic process design kit (gpdk) in conjunction with an advanced STT-MTJ model, which was implemented in Verilog-A. Key parameters of the STT-MTJ model included a low resistance state (LRS) of 6.21 kΩ and a high resistance state (HRS) of 16.8 kΩ, with a supply voltage set to 1.1 V. To enhance the accuracy of the output readings, a time-based sense amplifier was employed for the reading of majority outputs. Notably, the proposed approach is designed to perform computations simultaneously in both row-wise and column-wise directions, thereby optimizing computational efficiency and improving overall system performance within the memory array.
[0042] FIG. 2 illustrates an exemplary architecture demonstrating (a) the cycle of the in-memory implementation of a majority gate, performed as a READ operation via a sense amplifier, which accurately measures the effective resistance of memory cells within a 3x3 memory array given specific inputs, and (b) a row-column computation verified through transient simulation, in accordance with an embodiment of the present disclosure.
[0043] FIG. 2 (a) and FIG. 2 (b) (200a & 200b) can depict the simultaneous computation performed within the memory array, specifically highlighting the interactions occurring in row 1 (designated as rMAJ) and column 1 (labeled as MAJ). In this arrangement, the memory cells may be configured to store specific values: the inputs A, B, and C are represented as (1, 0, 0) in column C1, corresponding to Cell 00, Cell 03, and Cell 06, while row R1 holds the values (1, 1, 1) in Cells 00, 01, and 02. This configuration generates logic '1' and '0' simultaneously in both row and column directions, demonstrating bi-directional computing capabilities. The bi-directional computing strategy significantly can accelerate the execution of computational functions, facilitate efficient and compact mapping of data, and result in minimal penalties in terms of energy consumption and physical area requirements. To further illustrate the advantages of the approach, the proposed mapping methodologies may be employed in the implementation of the LF-PP adder, showcasing the effectiveness of the optimized design in enhancing performance within the memory array.
[0044] FIG. 3 illustrates an exemplary architecture showing (a) an existing design (300a), where optimization is achieved by implementing the LF-PP adder with fewer majority gates, as demonstrated in (b) the proposed design (300b), with regions of gate reduction highlighted by circles for clarity; (c) a depiction of the four-bit optimized LF-PP adder comprising six levels of majority (M) and NOT gates; and (d) simulation-based verification of the functionality for the four-bit LF-PP adder, assuming an initial carry of 0, in accordance with an embodiment of the present disclosure.
[0045] In an exemplary embodiment, the present disclosure can relate to an optimized design of the LF-PP adder, as illustrated in FIG. 3 (a) and 3 (b). The existing LF-PP adder design may be configured to enhance by strategically reducing the number of majority gates utilized, without increasing the total number of levels in the architecture. This results in a more efficient design that maintains functionality while improving performance metrics. The optimization may be achieved by utilizing intermediate computations and strategically deferring the computation of intermediate carries to subsequent levels when such computations are not necessary for the immediate calculation of the next carry.
[0046] Referring to FIG. 3 (c) (300c) can depict an optimized implementation of a 4-bit LF-PP adder using majority gates, demonstrating the streamlined configuration. Additionally, transient simulation waveforms relevant to the optimized design may be presented in FIG. 3 (d) (300d), providing empirical validation of the proposed enhancements. The LF-PP adder design can comprise two distinct functional blocks: the carry generate block (comprising majority gates M1 through M6) and the sum generate block (comprising majority gates M7 through M14), with each level of computation distinctly highlighted in varying colors to facilitate understanding of the design's hierarchical structure.
[0047] FIG. 4 illustrates an exemplary architecture of the proposed mapping techniques for the 4-bit LF-PP adder across all computation levels within the memory array, where all majority gates at each level are executed in parallel, in accordance with an embodiment of the present disclosure.
[0048] Referring to FIG. 4, the proposed mapping techniques may be employed for the 4-bit LF-PP adder across all computation levels within the memory array, demonstrating that all majority gates at each level are executed concurrently. The in-memory computational steps associated with the different logic levels are also delineated. It is noteworthy that, for the purpose of evaluating performance parameters, the outputs of the adder, including Cout, although written to memory, may be excluded from validation as shown.
[0049] In an exemplary embodiment, M.T-1 technique may be elucidated as follows: (i) The majority operations at each level are distinctly highlighted in varying colors, akin to the representation in FIG. 3 (c). (ii) The mapping process is meticulously detailed at each level without overwriting the content in the 4×8 processing area, with new writes prominently highlighted in red. The majority operations may be enclosed within dotted boxes that correspond to the colors depicted in (i). (iii) A comprehensive outline of the steps corresponding to the computational cycles is presented, ensuring clarity and understanding of the process.
[0050] The steps involved in M.T-1 (400a) are described as follows:
1. After writing the required initial inputs, execute the majority operations M1, M2, and M3 as a READ operation, as highlighted by the corresponding color in the dotted box, involving one column and two-row operations.
2. Write a1, b1 in the location as shown, with the new writes in memory at each level highlighted in red. Use the outputs from the previous level for subsequent computations at different levels, where these outputs serve as inputs for the gates in the next level. Hence, the intermediate outputs m1, m2, C1, and its complement are written to the corresponding locations.
Perform the majority operation M4 (a1,b1,C1) as a single column READ operation.
Write C2 and its complement to the corresponding location.
Execute majority operations M5 & M6, involving one column and one row operation.
Write ( (C3) ̅, Cout, (C4) ̅, C3, a1).
Execute majority operations (M7 to M10) involving two-column and two-row operations.
Write (m8, m9, m10, m7, Cin).
Execute majority operations (M11 to M14) involving two-column and two-row operations.
Write (S0,S1,S2,S3) to the memory array.
Furthermore, M.T-2 (400b) presentation of the technique shares similarities with the previous technique in sections (i) and (iii). However, a key distinction is noted in (ii), where the mapping process involves the replacement of content within the 4×4 processing area. In the scenario, new writes are again highlighted in red, and the majority of operations are enclosed within dotted boxes to maintain consistency in representation.
The steps involved in M.T-2 are described as follows: Note that the writing process here takes two cycles because overwriting existing data in memory requires writing the inputs with 0s (1s) in one cycle and with 1s (0s) in the next cycle within a column.
After writing the required initial inputs, execute the majority operations M1, M2, and M3 as a READ operation, as highlighted by the corresponding color in the dotted box, involving three-column operations.
Write a1, b1 in the location as shown, with the new writes in memory at each level highlighted in red. Use the outputs from the previous level for subsequent computations at different levels, where these outputs serve as inputs for the gates in the next level. Hence, the intermediate outputs m1, m2, C1, and its complement are written to the corresponding locations. (2 Cycles)
Perform the majority operation M4 (a1, b1, C1) as a single row READ operation.
Write a0, C2 and its complement to the corresponding location. (2 Cycles)
Execute majority operations M5 & M6, involving one column and one row operation.
Write ((C3) ̅, C3, Cout, (C4) ̅, a2, a3, b1) at the specified locations. (2 Cycles)
Majority operations (M7 to M10), involving four row operations.
Write (m7, m8, m9, m10, Cin). (2Cycles)
Majority operations (M11 to M14), involving four row operations.
Write (S0,S1,S2,S3) to the memory array. (1Cycle)
[0053] In an exemplary embodiment, the present disclosure can present a method for adding two binary numbers, (a3a2a1a0, b3b2b1b0, and Cin), organized within a designated processing area as illustrated in FIG. 4. The arrangement can apply to both mapping techniques: M.T-1, depicted in FIG. 4 (a), and M.T-2, shown in FIG. 4 (b). In the first computation cycle, the majority gates 1, 2, and 3 of the first level are executed concurrently, with each majority operation conducted as a READ operation. At subsequent levels, the outputs from the initial majority gates serve as inputs for the next set of majority gates that need to be executed. Accordingly, the outputs from the first level (m1, m2, m3/C1) are written to specific memory locations required for the following logic level, ensuring compliance with all objectives and constraints defined by the memory array architecture.
[0054] Furthermore, for M.T-1, the processing area can include a 4×8 matrix, initialized to a logic '0' state, ensuring that all memory cells maintain a parallel state for optimal performance in storing intermediate computation results. In this technique, outputs may be written in consecutive locations without overwriting existing data, thereby preserving data endurance. In contrast, M.T-2 features a fixed row count of 4, while the column count corresponds to n, reflecting the bit-width of the adder. Outputs in the technique are written by overwriting existing data and optimizing space utilization within the memory array. This dual mapping strategy enhances computational efficiency while adhering to the design constraints, thereby improving overall performance in the addition process.
[0055] In an exemplary embodiment, Table 1 presents a comparative analysis highlighting the advantages of the proposed n-bit In-Memory LF-PP Adder design relative to conventional implementations. The table underscores the optimized architecture of the proposed adder, specifically demonstrating a reduction in the number of majority gates required for operation. Notably, the reduction may be achieved without increasing the number of computational levels, thereby maintaining processing efficiency. The outlined benefits signify an advancement over existing designs, showcasing improvements in terms of resource efficiency and computational compactness, which contribute to enhanced overall performance in in-memory computing environments.
Table 1: Comparison of the proposed n-bit In-Memory LF-PP Adder design with existing work in "Accelerated addition in resistive RAM array using parallel-friendly majority gates".
Ref No. of bit adder (n) No. of Majority gates Total number of stages/levels
Proposed Work (Row-Column) 4 14 5
8 32 6
16 70 7
Existing Work (Column) Accelerated addition in resistive RAM array using parallel-friendly majority gates. 4 16 6
8 36 7
16 76 8
Note: In our proposed work, the number of stages is calculated by considering both the true and complemented outputs of generated carries at a single level during the read operation, as the sense amplifier can produce both simultaneously.
[0056] In an exemplary embodiment, Tables 2 and 3 furnish detailed comparative analyses of the proposed mapping techniques, designated as M.T-1 and M.T-2, against recent advancements documented in prior literature. The tables specifically highlight the achieved improvements in critical parameters, including the reduction in the number of memory cells utilized and the decrease in required write operations per computation cycle. Such reductions directly impact energy consumption by minimizing power-intensive write operations and enabling a more compact and efficient memory array area. The disclosed mapping strategies, as demonstrated in the tables 2 & 3, thus provide substantial advancements in energy efficiency and array area optimization compared to conventional approaches.
Table 2: Comparison of the proposed M.T-1 for a n-bit (n = 4, 8, and 16) In-Memory adder with Reference "Accelerated addition in resistive RAM array using parallel-friendly majority gates" in terms of memory cells, write operations, and computation cycles.
Mapping Approach in IMC array n-bit
PP Adder No. of
Memory Cells No. of Write Operations$ No. of Unused Cells No. of Computation Cycles Memory Type - Logic Primitive Technology
Row-Column (Proposed Work) 4 32 28 4 10 STT-MRAM- Majority 45nm
8 66 64 2 12
16 165 140 25 14
Column (Existing Work) in Accelerated addition in resistive RAM array using parallel-friendly majority gates 4 288 36 252 14 RRAM- Majority 130nm
$ The number of write operations is calculated by considering both the input data and the intermediate results.
Table 3: Comparison of the proposed M.T-2 for an n-bit (n = 4, 8, and 16) In-Memory adder with Reference "Parallel-prefix adder in spin-orbit torque magnetic RAM for high bit-width non-volatile computation" in terms of memory cells, write operations and computation cycles.
Mapping Approach in IMC array n-bit
PP Adder No. of
Memory Cells No. of Write Operations Computation Cycles Memory Type - Logic Primitive Technology
Number Remarks
Row-Column
(Proposed Work) 4 16 29 14 Since one write driver is connected for all engaged rows, writing '0' and writing '1' consume separate cycles, resulting in a slight increase in computational cycles. STT-MRAM - Majority 45nm
8 32 62 17
16 64 147 20
Column (Existing Work in Parallel-prefix adder in spin-orbit torque magnetic RAM for high bit-width non-volatile computation) 4 NR NR 10 The work considered one write driver for each row, which is why it shows fewer computation cycles. SOT-MRAM- Majority 40nm & 180nm
16 NR NR 14
NR denotes "Not Reported"
[0057] In an exemplary embodiment, Table 4 provides a comparative assessment of the proposed mapping techniques, M.T-1 and M.T-2, for an n-bit In-Memory adder against the most recent relevant works. In the table, the energy calculations for both the proposed methods and existing designs are derived based on the energy consumed during in-memory addition operations. These calculations account for the energy required for switching magneto resistive random-access memory (MRAM) cells during write operations as well as the energy expended by the sense amplifier (SA) during majority read operations. Given that write operations (where Ewrite = 284.3 fJ for spin-transfer torque MRAM) demand significantly more energy than read operations (where Eread = 64 fJ for STT-MRAM), the total energy consumption is predominantly estimated based on the energy attributed to write operations. The approach ensures a precise approximation of the overall energy requirement for the in-memory addition process, highlighting the energy efficiency improvements of the proposed techniques over existing methods.
Table 4: Comparison of the proposed M.Ts for a n-bit In-Memory adder with existing works.
Mapping Approach Array Area
(includes # rows & #Columns) Energy$ Latency
(# cycles)
Proposed Work
M.T-1 Rows: 3/2 (log2n)2 - (11/2) log2n + 9
Columns: ½ (log2n)2 + (½) log2n + 5 Ewrite * [20(log2n)2 - 64 log2n + 76] 2log2n+6
Proposed Work
M.T-2 4*n Ewrite * [26(log2n)2−97(log2n) +119] 3log2n+8
Reference: Existing Work in Accelerated addition in resistive RAM array using parallel-friendly majority gates. 6 (8n + 16) Ewrite * [3n(log2n) + 4n + 6]
4log2n+6
Reference: Existing Work in Parallel-prefix adder in spin-orbit torque magnetic RAM for high bit-width non-volatile computation. m*n input array and
3*n medium array NR 2log2n+6
NR denotes Not Reported. $ The energy reported here is approximated to the write energy. Ewrite is Energy per single bit write.
[0058] In summary, the present disclosure investigate the use of IMC by leveraging non-volatile memristor technology to integrate storage and computation within the same physical location. The field of IMC shows immense promise for achieving energy-efficient, high-performance computing, particularly through the application of STT-MTJs. As non-volatile devices, STT-MTJs offer an ideal pairing with CMOS technology for arithmetic operations due to their high density, low power consumption, and durability. The combination is poised to drive advancements in IMC applications. Our study focuses on a 2T-1MTJ IMC approach that uses STT-MRAM not only for data storage but also for executing Boolean arithmetic within the memory array. By implementing a novel PP adder design based on majority logic, we introduce a mapping technique that minimizes gate use, thereby reducing logical depth and latency. Furthermore, our proposed mapping technique enhances parallelism by enabling computations across both rows and columns, significantly reducing energy consumption and area requirements while optimizing latency within the memory array.
ADVANTAGES OF THE PRESENT DISCLOSURE
[0059] The proposed disclosure integrates storage and computation within the same memory array, thereby significantly reducing data transfer energy and lowering overall energy consumption by minimizing the number of write operations to the array.
[0060] The present disclosure minimizes logical depth and reduces latency, thereby supporting high-speed computing applications.
[0061] The present disclosure reduces the number of required gates and memory cells, resulting in a more compact memory array area.
[0062] The present disclosure provides adaptability to various application demands, such as high-density and high-speed operations, thereby enhancing the versatility of the design for a range of computational tasks.
, Claims:1. A method (100c) for optimizing in-memory computation (IMC) within a memory array, comprising the steps of:
performing (102) binary addition of two binary numbers along with an initial carry-in within designated cells of the memory array;
optimizing the adder design (104) by minimizing majority gates and implementing them within the memory array, while utilizing efficient mapping configurations (achieving O1 and O3) to reduce area and energy consumption;
configuring (106) the memory mapping area for storing intermediate computational results, comprising:
a dynamic M.T-1 mapping technique wherein intermediate computation results are stored in a processing area determined by the most optimal mapping available, preventing data overwriting; and
a fixed 4×n processing area M.T-2 mapping technique, wherein n corresponds to a bit-width of the input numbers, allowing overwriting of intermediate results for enhanced space utilization.
setting (108) the memory array to a default logic "0" state, thereby ensuring the memory cells are in a parallel state before beginning computation;
assigning (110) initial inputs in the processing array, with designated cells set to logic "1" as per constraints C2 and C3, based on the selected optimized mapping strategy;
reading (112) majority outputs from the current computational level simultaneously, wherein row-wise and column-wise computations are used to enhance parallelism and reduce area, in compliance with constraints C1 and C3; and
writing (114) the outputs of each computation level to designated memory locations within the array, preparing these outputs for subsequent computation stages, wherein:
for the M.T-1 Mapping, outputs are written without overwriting existing data in the memory cells, maintaining intermediate values for concurrent calculations; and
for the M.T-2 Mapping, outputs are written by overwriting existing data within the processing area, optimizing area utilization within the constraints of the mapping technique.
repeat the read and write operations continuously until all computational levels have been completed.
2. The method (100c) as claimed in claim 1, wherein the M.T-2 mapping technique is employed to allow frequent rewriting of magnetic tunnel junction (MTJ) cells, thereby achieving optimal area utilization within the memory array.
3. The method (100c) as claimed in claim 1, wherein the M.T-1 mapping technique is utilized to reduce unnecessary data replacements, thus preserving the endurance of MTJ cells and enhancing performance.
4. The method (100c) as claimed in claim 1, wherein a majority-based LF-PP adder is implemented within the memory array by performing row-wise and column-wise simultaneous computations, with the following features:
majority gates arranged in a manner to minimize the number of gates required without increasing computational levels, achieving efficient in-memory computation by leveraging -
intermediate computation results deferred to subsequent computation levels when they are not essential for immediate carry calculations, further reducing the computational burden.
5. The method (100c) as claimed in claim 4, wherein the mapping techniques M.T-1 and M.T-2, support simultaneous row and column operations in the memory array by configuring the storage of values in specific cells as follows (200a):
column C1 configured to store values (A, B, C) in specific cells (Cell 00, Cell 03, Cell 06) with binary values (1, 0, 0); and
row R1 configured to store values (1, 1, 1) across cells (Cell 00, Cell 01, Cell 02), enabling bi-directional computation with minimal area and energy penalties.
6. The method (100c) as claimed in claim 1, wherein the performance of the mapping techniques is evaluated using the Cadence Virtuoso System Design Platform, simulated with a 45 nm CMOS generic process design kit (gpdk) and a state-of-the-art STT-MTJ model, under conditions including low resistance state (LRS) of 6.21 kΩ, high resistance state (HRS) of 16.8 kΩ, and supply voltage of 1.1 V.
7. The method (100c) as claimed in claim 1, wherein the optimized majority-based LF-PP adder design is structured to leverage consecutive majority gate executions at the first level, with each majority operation read as a distinct READ operation, and outputs are stored in predetermined locations to facilitate the next level of computation.
Documents
Name | Date |
---|---|
202441086517-COMPLETE SPECIFICATION [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-DECLARATION OF INVENTORSHIP (FORM 5) [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-DRAWINGS [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-EDUCATIONAL INSTITUTION(S) [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-EVIDENCE FOR REGISTRATION UNDER SSI [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-FORM 1 [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-FORM FOR SMALL ENTITY(FORM-28) [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-FORM-9 [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-POWER OF AUTHORITY [09-11-2024(online)].pdf | 09/11/2024 |
202441086517-REQUEST FOR EARLY PUBLICATION(FORM-9) [09-11-2024(online)].pdf | 09/11/2024 |
Talk To Experts
Calculators
Downloads
By continuing past this page, you agree to our Terms of Service,, Cookie Policy, Privacy Policy and Refund Policy © - Uber9 Business Process Services Private Limited. All rights reserved.
Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.
Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.