Efficient Iterative Detection Based on Conjugate Gradient and Successive Over-Relaxation Methods for Uplink Massive MIMO Systems

 Being a crucial aspect of fifth-generation (5G) mobile communications systems, massively multiple-input multiple-output (mMIMO) architectures are expected to help achieve the highest key performance indicators. However, the huge numbers of antennas used in such systems make it difficult to determine the inversion of the signal channel matrix relied upon by several detection methods, hence posing a problem with accurate estimation of the symbols sent. In this paper, conjugate gradient (CG) and successive over-relaxation (SOR) methods are selected to construct a new iterative approach that avoids the matrix inversion computation issue. This suggested approach for up-link mMIMO detection is based on a joint cascade structure of both iterative methods. The CG method is first applied and adjusted for the initial solution, followed by the SOR method in the final iterations for terminal computations, resulting in an algorithm with robust performance and low computational complexity. Furthermore, the new hybrid scheme operates based on the relaxation parameter, whose value has a great impact on error performance and, whose optimal determination is necessary. Numerical simulations reveal that the proposed scheme is capable of significantly improving signal detection accuracy with minimum complexity. The simulation results indicated that the proposed detector outperforms CG and SOR detectors, achieves close to optimal performance, requires fewer iterations, and reduces complexity.


Introduction
5G wireless communication networks are one of the most significant developments in contemporary technology, as they have contributed to the realization of Intelligent Internet of Everything (IIoE), bringing profound changes to people's lives by increasing network traffic, creating numerous industry verticals, facilitating the functioning of the entire society and making the digital world more connected. This, in turn, has led to the development of several mMIMO system-based uplink (UL) and downlink (DL) schemes that serve as one of the key features of all advanced cellular wireless systems, with a particular emphasis on 5G [1]. The 3rd Generation Partnership Project (3GPP) will continue creating roadmaps in order to study and add functionalities to the new releases of the new radio (NR) air interface, with the ultimate purpose of improving 5G performance in terms of network coverage, mobility, MIMO evolution, and positioning [2]. The role of high-resolution channel state information (CSI) will be increasing as well to facilitate operation of systems that rely on artificial intelligence (AI) and machine learning technologies to provide data-driven, intelligent network solutions [3] anticipated to achieve the target key performance indicators (KPIs) [4], [5].
The mMIMO technology is capable of utilizing hundreds to thousands of antennas operating in the millimeter-wave band of the spectrum (between 30 GHz and 300 GHz) to improve network throughput and capacity by offering a wider bandwidth, reduce end-to-end latency, increase reliability, enhance spectral and energy efficiencies, improve spatial diversity, etc. Furthermore, such a system aims to overcome the challenges posed by constraining factors, such as hardware cost, power consumption, and signal detection-related complexities, especially as the number of antennas used increases [6]. Signal detection is an important process used in mMIMO UL. The base station (BS) needs instant and accurate information about the CSI to perform precoding in DL and detection in UL [7]. For networks with numerous users and channels that are spatially correlated, more advanced detection algorithms are needed in order to increase spectral efficiency. The complexity of these UL mMIMO detection algorithms is affected by the number of antennas at both the receiving and transmitting ends. This impacts the efficiency with which they can complete the multiplication and inversion of large dimension matrices. Hence, a good balance should be found between performance and complexity [8].
In the literature, signal detection algorithms for UL mMIMO systems can be classified into two categories: linear detection algorithms and their non-linear counterparts. Both approaches differ in terms of the methods applied to compute the solutions. Linear algorithms are usually less complex than non-linear ones, but they may be the most difficult solution, as they involve inversion of higher-rank matrices formed from a very large-scale system [6].
The optimum restoration, at the receiver, of the sent symbols that are distorted by flat fading channels, interference, and noise is a difficult issue for mMIMO systems, as it required advanced signal processing techniques to perform their equalization. In conventional methods, the maximum likelihood (ML) is the optimal detector, providing the best performance, but as the number of antennas increases exponentially, its computational complexity rises as well [9], [10]. Therefore, sub-optimal detectors with reduced complexity are necessary to cope with the excessive complexity of optimal ML detection [11]. Linear detection algorithms offering close-tooptimal performance, such as zero-forcing (ZF) and minimum mean squared error (MMSE), can be used as an alternative strategy. Still, they require the matrix inversion to be computed, which adds to computational complexity. Linear detectors are significantly outperforming the matched filter (MF) detector in a mMIMO system with a finite number of BS antennas and a comparatively small number of users [12]. Bit error rate (BER) performance is considerably inferior when compared with that of ML, particularly with high signal-to-noise ratio (SNR) values. Many iterative methods for approximating or avoiding matrix inversion are proposed as a solution, such as the Neumann series (NS) method [13], the Newton iteration (NI) method [14], the Gauss-Seidel (GS) method [8], the Jacobi (JA) method [15], the Richardson (RI) method [16], the CG method [17], and the SOR method [18]. The complexity of iterative procedures is highly influenced by a greater number of iterations. However, CG and SOR methods with high loading factors of the mMIMO system maintain satisfying performance and retain a low level of complexity in the size of the quadrature amplitude modulation (QAM) constellation.
Most current research focuses on combining different methods to maintain the detectors' performance and reduce their complexity. Among the many works related to our approach that have been proposed, we distinguish, for example, the study of [13] dealing with an mMIMO signal detection algorithm resulting from a combination of two iterative methods (JA and GS), with the former exploited to initialize the latter in order to create a detector with a minimum complexity level for an UL mMIMO system.
The authors of paper [19] based their approach on a combination of steepest descent (SD) and JA methods, where SD is employed to obtain an efficient searching direction for the subsequent JA iteration, and to speed up convergence. The study presented in [20] focused on a hybrid detector that combines MMSE, the alternating direction method of multipliers (ADMM), and the GS method. MMSE and ADMM are used in the initialization, while the GS method is used in the estimation. The combination of BLAST algorithms with other detection methods has shown promise in decreasing complexity and boosting massive MIMO detection efficiency. In [21], the researchers presented a hybrid detection approach that combines BLAST with an iterative detection algorithm. Such a scheme utilizes a BLAST-based initial detection, fol-lowed by a low-complexity iterative detection algorithm for further refinement. As compared to previous state-of-the-art detection approaches, the suggested hybrid scheme delivers considerable performance gains while requiring less computational complexity.
In this paper, we select CG and SOR methods, linked serially, to build a new algorithm for mMIMO BS detectors. In fact, the CG method will be effectively utilized to compute the initial solution of the named CG-SOR-based detector, and the SOR iteration method will be exploited by choosing the optimal relaxation parameter to achieve high BER performance and lower complexity.
The rest of this paper is organized as follows. Section 2 introduces the uplink massive MIMO system model. Section 3 describes the conventional linear detection approach. Section 4 clarifies, in detail, the iterative methods for massive MIMO detection, including the proposed approach. Section 5 contains numerical results and a discussion. Finally, some conclusions are presented in Section 6.
Note: In this paper, italic capital letters and lowercase letters represent matrices (e.g. A) and vectors (e.g. a), respectively. The superscripts of matrix A, A −1 , A T , A H and A † indicate the inverse, the transpose, the Hermitian transpose and the pseudo-inverse of A, respectively. Furthermore, I U denotes the U × U identity matrix, |.| denotes the absolute operator. We denote ||.|| and ||.|| F as the Euclidean norm of a vector and the Frobenius norm of a matrix, respectively. Functions diag(.), tril(.) and triu(.) create and compute diagonal, lower triangular, and upper triangular matrices, respectively.

Uplink Massive MIMO System Model
In order to study UL transmissions in the system model, we consider a mMIMO BS having N antenna elements that can serve U single-antenna users simultaneously, where N ≫ U . The modulated symbol vector s = [s 1 , s 2 , . . . , s U ] T ∈ C U ×1 denotes the signal transmitted by all users and the symbol vector y = [y 1 , y 2 , . . . , y N ] T ∈ C N ×1 represents the signal received at the BS. The channel between U users and N BS antennas forms a channel matrix H ∈ C N×U defined by a set of flat Rayleigh fading complex coefficients, with its elements being independent and identically distributed (i.i.d.), generated by Gaussian random variables with zero mean and σ 2 unit variance.
For the theoretical analysis of mMIMO systems, we commonly consider that the flat (frequency non-selective) Rayleigh fading channel model is accurate and widely adopted. This assumes no correlation or mutual coupling between the transmitting or receiving antennas. The favorable propagation is the most important property of this model in mMIMO systems.
The relationship between s, y and H can be modeled as: where n ∈ C N ×1 is the i.i.d. complex Gaussian noise vector with zero mean and variance σ 2 n . Note that CSI is known at the receiver.
The system model of mMIMO is described and shown in Fig. 1

Conventional Linear Detection
MIMO signal detection seeks to identify sent vector s from received vector y. It searches and scans in an exhaustive manner. The maximum likelihood (ML) is the ideal algorithm to solve Eq. (2), which is formulated as [13]: whereŝ, SN R and χ are the estimated signal, signal-tonoise-ratio, and modulation alphabet size, respectively.
For an N × U mMIMO system with symbols from the M-QAM constellation alphabet, the computational complexity grows exponentially with the constellation size M and the number of transmitters U , as shown in Eq. (2). SNR is defined as U × E s /σ 2 , where E s is the average transmit power per symbol.
The accuracy of the noise power estimation can affect mMI-MO signal detection performance by influencing uncertainty in noise power on the SNR side, represented by σ 2 . Therefore, it is important to reduce uncertainty in noise power estimation to ensure optimal performance of mMIMO signal detection.
ML is considered to be one of algorithms characterized by exponential complexity, which renders it unsuitable [13].

Matched Filter (MF) Detector
Matched filter is one of the most attractive features of mMI-MO, as it considers interference from other substreams to be pure noise. When the number of users U is substantially lower than N BS antennas, MF performs adequately, but as U increases, MF underperforms compared to more complex detectors. Maximum ratio combining (MRC) is an alternative word for MF that seeks to maximize the received SNR for each stream by ignoring the influence of multi-user interference [8]. The mathematical expression of the MF detector can be given as:ŝ The slicer S(.) determines the closest symbol to the MF output.
The MF output of y is represented by: By MRC, the equalized symbolŝ becomes:

Zero-Forcing (ZF) Detector
The zero-forcing criterion-based receiver is the simplest linear detector which simply inverts the channel matrix while assuming that the H matrix is invertible. When H is badly adapted, the ZF detector gives the correct results for high SNR, but for low SNR, performance is strongly affected by noise. Practically, if H is not square (i.e. if U ̸ = N ), it is imperative to multiply by the pseudo-inverse of the channel matrix to recover all transmitted symbols. So, the estimated vector is:ŝ where H † is the pseudo-inverse of the channel matrix.
To mitigate the noise enhancement introduced by the ZF detector, an MMSE detector has been proposed, where the noise variance is considered in the construction of the filtering matrix.

Minimum Mean-Square Error (MMSE) Detector
The objective of the detector used in the MMSE-based mMI-MO approach is to reduce mean square error caused by noise and inter-symbol interference (ISI). The estimated signal vectors coming from U different users can be represented by: This estimated signal can be interpreted as the matched filter output y M F defined in Eq. (4) and the MMSE weighting matrix W , which is expressed by: We note that the computational complexity is extremely high if the inverse matrix of W exists, and the problem gets even worse when using a large size of W . As a result, numerous iterative strategies for approximating or avoiding the inverse of W have been proposed [13], [22]. The MMSE detector is a practical and efficient solution that can achieve near-optimal performance in mMIMO scenarios, where the channel state information (CSI) is accurately known. In comparison to the ML detector, which is theoretically optimal but computationally complex and not always practical, the MMSE detector is considered to be an optimal combining detector, due to its good performance and practicality. Overall, the MMSE detector provides a balance between performance and complexity that makes it a popular choice for many practical mMIMO systems.

Conjugate Gradient (CG) Method
Conventional CG is one of the most effective methods allowing to avoid matrix inversion and solving linear equations.
The CG method is implemented as an iterative algorithm, and it is a member of the class of Krylov subspace methods [8].
In CG, the estimated signal can be obtained as follows: whereŝ (k) andŝ (k+1) are the approximations ofŝ in the k-th and (k+1)-th iterations, respectively, and p (k) is the conjugate direction in relation to the MMSE filtering matrix W , i.e: and α (k) is a scalar parameter which can be calculated as: Let us define the residual r (k) at the k-th iteration as: and Each subsequent step (for k + 1) is a linear combination of the next residual r (k+1) and the current step of the conjugate direction p (k) : where the scalar for the linear combination is: Algorithm 1 summarizes the CG method for mMIMO signal detection. According to Algorithm 1, we can see that computational complexity in the k-th iteration was reduced from O(U 3 ) to O(U 2 ). The result shows that the CG algorithm outperforms state-of-the-art algorithms and attains near-perfect performance of the MMSE for mMIMO system by using a minimum number of iterations [23]. Based on this feature, we chose to use the CG method for initializing the proposed scheme in the initial iterations, relying on its performance and numerical stability.

Successive Over-Relaxation (SOR) Method
The SOR method is one of the most important solutions for large linear systems, such as those expressed in Eq. (1). It is a method that improves and accelerates the outcomes achieved with the use of the GS method. The estimated signal vector using the SOR iteration method is: where ω is called the relaxation parameter, affecting SOR convergence.
As demonstrated in [24], because the MMSE filtering matrix W is symmetric positive definite for uplink mMIMO system, we can decompose W into strictly lower triangular entries L, strictly upper triangular elements L H , and diagonal entries D.
The decomposed W matrix is: The process of reducing residuals at each stage is called "successive relaxation". If 0 < ω < 1, the iterative method is known as "successive under-relaxation" and it can be used to obtain convergence when the GS algorithm is not convergent. For choices of ω > 1, the method is called "successive overrelaxation" and it is used to accelerate the convergence of GS iterations. SOR becomes a GS iteration when ω = 1. So, the GS method is a particular case of SOR. Right now, the aim is to choose ω such that the convergence rate is maximized, and the purpose of using the ω value is to reduce the spectral radius of the iteration matrix. Then, the SOR iteration given in Eq. (16) can be rewritten as [24]: where B ω represents the iteration matrix given by: and e is an iteration vector given by: The necessary condition for the convergence of Eq. (18) is that the spectral radius ρ(B ω ) should satisfy ρ(B ω ) < 1 only if 0 < ω < 2. The authors of [25] demonstrated an uncomplicated method for computing the quasi-optimal relaxation parameter of the SOR detector used in practical mMIMO system configurations. This method was proven to be optimal and depends only on the loading factor, which includes the number of users and the number of BS antennas. The optimal relaxation parameter for the SOR detector can be computed as: The quasi-optimal relaxation parameter for a SOR-based detector is:ω where ψ = N/U denotes the loading factor. From Eq. (22) we can observe that once N and U are previously known and further H and W are changed,ω 0 is not computed again [25]. The above method is summarized in Algorithm 2. The SOR equation defined by Eq. (16) can also be written equivalently in the following form:

Proposed Method
In this work, we combine the advantages of two iterative methods, CG and SOR, to design a low-complexity UL mMIMO detector. Figure 2 shows a block diagram of the proposed detector based on the CG-SOR algorithm. The proposed detector is constructed based on two main cascading stages: initialization and final estimation. The proposed detector consists of the following stages: 1) Initialization stage. To improve the performance of the proposed algorithm by taking advantage of the performance feature and the numerical stability in the first iterations of the CG, the initial estimation using this method is computed where k = 1, and it is computed as follows: -initialize the scalar parameter of the CG method: -apply the first iteration of the CG method: 2) Final estimation stage. Proceed to the final solution with the remaining iterations by applying the SOR method. In other words, performing (k − 1)−time SOR iterations where k 2.

Apply the SOR method
Final estimation Initialization y s (1) Apply s the first iteration of CG method (0) Compute α the initial scalar parameter of CG method

Numerical Results and Discussion
In this section, we assess and compare the computational complexity of the proposed approach with that of other well-known algorithms described in the literature, as well as analyze the detection performance for a multi-user massive MIMO uplink system configuration.

Complexity Analysis
To analyze the computational complexity of the proposed approach and compare it with other recently presented algorithms, in this paper we can consider that multiplication and division operations are of the utmost importance in terms of complexity, while addition and subtraction operations can be neglected [26]. Firstly, we analyze the computational complexity of the initial estimation. From Algorithm 3, the computational complexity in the first iteration of the CG method is related to computing α (0) andŝ (1) . The operation to compute α (0) consists of one multiplication of the (1 × U ) vector (y M F ) H and the (1 × U ) vector y M F , together with a multiplication of the (1 × U ) vector (y M F ) H , the (U × U ) matrix W and the (1 × U ) vector y M F , so U 2 + 2U multiplications are needed. U multiplications are necessary to calculateŝ (1) . To sum up, U 2 + 3U multiplications are requested for the initial estimation. Secondly, the computational complexity of the final estimation, from the calculation of Eq. (23) for SOR iterations, determines the required number of multiplications in the computation of: is 1 and (2U + 1), respectively. Thus, the computation of each element ofŝ (k) requires (2U + 2) multiplications. As there are (2U ) elements inŝ (k) , then the number of multiplications is (4U 2 + 4U ). So, the overall number of multiplications needed for k-th iteration of the SOR method is 4k(U 2 + U ). Finally, the proposed CG-SOR detector requires (U 2 + 3U ) to initialize the detection and 4k(U 2 + U ) to estimate the signal. Thus, the total computational complexity linked with implementing the proposed algorithm becomes (4k + 1)U 2 + (4k + 3)U .

Method
Computational complexity CG-SOR (4k + 1)U 2 + (4k + 3)U The proposed method accomplishes the requirement of low complexity, which is determined by the number of users, BS antennas, and iterations used in the simulation. Furthermore, it requires a low number of iterations to achieve the expected performance.

Performance Results
In this sub-section, we perform certain computer-based simulations in order to verify the detection performance of the proposed algorithm using the parameters listed in Table 2, and we compare BER performance of the proposed algorithm with the benchmark MMSE detector and the studied iterative algorithms, such as JA, CG, and SOR, with suitably chosen optimal parameters. To present the relationship between BER and average SNR, we use Matlab software based on Monte-Carlo simulations to reduce the time needed to perform the computations.

Simulation parameters Type and value
Antennas at base station 128 Number  Fig. 3, the BER performance curve appears in the form of a parabola and has different depths, depending on the SNR value. The higher the SNR, the deeper the parabola dip. So, an efficient relaxation factor is determined by a minimum BER value at high SNRs. In this case and according to the graph, the optimal relaxation parameter of ω 0 = 1.2 is recommended. Figure 4 shows the BER performance of the SOR iterative algorithm versus the relaxation parameter. SNR is fixed at 18 dB, and the number of iterations is k = 3 for the N ×U = 128×32 configuration. The optimal relaxation parameter ω 0 is determined by the minimum point of the graph. Minimum BER performance is achieved at ω of approximately 1.2.
In order to optimize the relaxation parameter of the SOR and the proposed CG-SOR methods, as shown in Fig. 3 and Fig. 4, we considered the 128×32 scenario in the third iteration with or four iterations are used for a 128 × 16 mMIMO system configuration with 64-QAM modulation constellations. When we increase the number of iterations from two to three and then to four, we see that the shape of the BER curve reflects the quasi-optimum quality of the proposed detector. Figures 8-9 reveal that the advantages of the proposed algorithm based on CG-SOR are evident when the number of users increases (for a 128×32 mMIMO system configuration). We can see that our proposed CG-SOR algorithm is still capable of achieving near-MMSE performance, which is obviously superior to that of JA, CG, and SOR algorithms for a small number of iterations.  Figure 10 illustrates the number of multiplications required by the proposed detector and other iterative methods. It is clear that the CG method has the least number of multiplications among those mentioned in the computational complexity in Table 1 but it requires a large number of iterations with high SNR to obtain the best BER performance. This is one of the features that made us exploit it to initialize the proposed algorithm. JA, SOR, and the proposed method are relatively equal in terms of the number of multiplications, but differ in performance. However, the CG-SOR method outperforms them in terms of BER performance, as seen in the three graphs above, simultaneously offering low computational complexity.

Conclusion
In this paper, we developed an efficient detection scheme by proposing a low-complexity algorithm for the problem of uplink signal detection in massive MIMO systems through the selection and joint use of two iterative methods. Compared with its constituent algorithms (CG and SOR) and other conventional linear detectors, the proposed detector can always outperform other detectors for all antenna system configurations, simultaneously remaining insensitive to low SNR values and a small number of iterations. Numerical simulations show that our CG-SOR algorithm is stable and offers robust performance approximating that of a near-optimal algorithm.