A Novel Kernel Algorithm for Finite Impulse Response Channel Identification

 Over the last few years, kernel adaptive filters have gained in importance as the kernel trick started to be used in classic linear adaptive filters in order to address various regression and time-series prediction issues in nonlinear environments. In this paper, we study a recursive method for identifying finite impulse response (FIR) nonlinear systems based on binary-value observation systems. We also apply the kernel trick to the recursive projection (RP) algorithm, yielding a novel recursive algorithm based on a positive definite kernel. For purposes, our approach is compared with the recursive projection (RP) algorithm in the process of identifying the parameters of two channels, with the first of them being a frequency-selective fading channel, called a broadband radio access network (BRAN B) channel, and the other being a a theoretical frequency-selective channel, known as the Macchi channel. Monte Carlo simulation results are presented to show the performance of the proposed algorithm.


Introduction
Linear adaptive filters are a class of digital filters that can automatically adjust their parameters based on the input data. They are commonly used for signal processing tasks, such as noise reduction, echo cancellation, and equalization [1]- [4]. The basic idea behind linear adaptive filters is to use an algorithm that updates the filter coefficients in response to changes in the input signal [5], [6]. The most widely used algorithm for this purpose is the least mean squares (LMS) algorithm [7], which iteratively adjusts the filter coefficients to minimize the mean squared error between the filter's output and the desired output. The performance of linear adaptive filters depends on several factors, including the choice of filter structure, the algorithm applied to update the filter coefficients and the design of the input signal. In general, these filters work best when the input signal is stationary or varies slowly over time and when the filter structure is chosen to match the statistical properties of the signal.
One of the most challenging problems encountered in engineering digital communication systems consists in minimizing the impact of the communication channel. The most common solution to this issue consists in estimating the channel's impulse response parameter, and then using an equalizer to equalize it (see e.g. [8]- [12]). This approach relies strongly on the quality of the estimation, also known as system identification. System identification (SI) is an area of key interest in the field of automatic control. In SI, the aim is to build the most adequate mathematical models of dynamic systems based on experimental data, i.e., using measurements of the system input/output (IO) signals [13]- [16]. At present, binary-valued observation systems are receiving a great deal of attention [17]- [22] thanks to the extensive applications of binary-valued sensors, including asynchronous transfer mode (ATM), Hall-effect sensors for velocity and acceleration, witching sensorsfor exhaust-gas oxygen level, industrial sensors for brushless DC motors, photoelectric position sensors, etc. [23], [24]. The input (or output) monitored in these systems cannot be directly measured. Instead, it is the information whether the input (or output) of the system is superior or inferior to a specific numerical value, called a threshold (a key factor for binary-valued systems), that can be measured and used to implement a system controller. In terms of binary-valued observation systems, there are several important results impacting the recursive identification of single-input single-output (SISO) finite impulse response (FIR) communication channels, the identification of infinite impulse response (IIR) systems, and state estimation problems that are presented in [25]- [28]. Over the last few years, a technique based on kernel adaptive filtering (KAF) [29], [30] has been employed in a wide range of telecommunication applications. KAF is proposed as a way to overcome the limitations of classic linear filtering techniques when handling non-linear systems. Linear filtering techniques, such as the LMS algorithm and its normalized variant (NLMS algorithm) [5], suppose that the input signal and the filter coefficients have a linear relationship, which may not be accurate in many real-world applications. KAF integrates kernel methods which can map data to a higher-dimensional space where classical linear techniques can be more accurate, with adaptive filtering techniques to handle non-linearity and non-stationarity in the input signal, where linear filtering can be executed [31]. The kernel function can be chosen to correspond to the input sig-nal's properties, enabling KAF to adapt to the change in the signal's properties over time.
One of the key advantages of kernel-based methods is that they can work with non-linearly separable data. They are based on the principle that a decision boundary in a high-dimensional reproducing kernel Hilbert space (RKHS) [32] can be represented as a linear boundary in a lower-dimensional space. This allows kernel methods to effectively handle complex, nonlinear relationships between the input features and the output variable, which represents an emerging technique for machine learning applications, i.e. regularization networks [33], Gaussian process regression (GPR) [34], and support vector machines (SVMs) [35], as well as for nonlinear signal processing and classification.
At present, there are many adaptive kernel filtering algorithms that have been described in the literature of the subject. Some of them include the following: kernel affine projection algorithms (KAPA) [36], kernel principal component analysis (KPCA) [37], kernel least mean squares (KLMS) [38], and kernel recursive least square (KRLS) [39]. To increase the robustness of these adaptive kernel filtering algorithms, many different variants have been developed, such as quantized kernel recursive least squares (QKRLS) [40], quantized kernel least mean square (QKLMS) [41], extended kernel recursive least squares (Ex-KRLS) [42], kernel least mean square with adaptive kernel size (KLMS-AKS) [43], random Fourier feature kernel recursive least squares (RFF-KRLS) [44], quantized kernel least lncosh (QKLL) [45], and kernel extended improved proportionate normalized least mean square algorithm (KE-IPNLMS) [46]. In this paper, we address the recursive identification problem of finite impulse response (FIR) single-input single-output (SISO) communication channels using kernel techniques. Indeed, the method proposed by Guo and Zhao in [26] for linear systems with binary-valued observations is extended to the general case of nonlinear system identification using kernel methods. Several simulation results, both under noisy environments and for diverse data lengths N , are provided to demonstrate the accuracy of the proposed kernel method. Our proposed algorithm is referred to as the KRPI algorithm.
The paper is organized as follows. In Section 2, we described the configuration of the nonlinear system identification problem based on binary-valued output observations. In Section 3, we briefly presented the recursive projection (RP) algorithm. In Section 4, we examined some fundamental notions of the kernel approaches. In addition, the proposed kernel recursive projection identification (KRPI) algorithm for nonlinear systems with binary output measurements is explained in Section 5. Some simulations to demonstrate the effectiveness of the proposed algorithm are shown in Section 6. Finally, Section 7 focuses on a brief discussion and concludes the paper.

System Descriptions and Assumptions
We consider the single-input single-output (SISO) nonlinear system (presented in Fig. 1) described by the following Eq. (1): where are the input sequence, channel coefficient (parameter vector), of size L, and non-linearity, respectively. The transpose operator is denoted by the superscript ⊤. d(k) is the system output corrupted by the measurement noise b(k). Using a binary detector I[.] equipped with a predefined threshold C ∈ R, the output of the system d(k) becomes measurable. The quantized output data s(k) could be expressed by: To make the system analysis easier and to come up with a good result, we assume, in this paper, that:
Our primary objective in this paper is to construct a recursive kernel identification algorithm for FIR systems by using positive definite kernels and binary observations s(k), in order to provide a recursive estimate of θ.

Recursive Projection (RP) Algorithm
A recursive projection algorithm for finite impulse response (FIR) system identification with binary-valued observations under sufficiently rich inputs and a fixed quantized threshold was presented by Guo and Zhao [26]. In this paper, we briefly present this algorithm. The goal of this method is to estimate the vector parameter θ in real time. It is based on the following assumptions: Assumption 2. The measurement noise {b(k), k 1} is an i.i.d. sequence of random variables with finite variance σ 2 and zero mean. F (.) and f (.) are the distribution and density functions of b(1), respectively. Note that F (.) and f (.) are assumed to be known.
The update equations of the RP algorithm are given by: where: -θ(k) represents the estimation of θ(k) at time k, -⊓ Ω is the projection operator in a convex compact set Ω ⊆ R L , which is defined as: β > 0 denotes the constant scalar, that has an important role in the convergence ofθ(k). From the point of view of search procedures, new challenges arise and the recursive projection algorithm does not work due to the difficult knowledge of the noise distribution function (when the channel error is unbounded). Therefore, many extensions of this algorithm are proposed, see for example [25], [47], [48].

Kernel-based Adaptive Filters
Kernel methods are a class of machine learning algorithms that are widely used for classification, regression, identification, and other tasks [29], [30], [32]. They operate by transforming data into a higher-dimensional space in which it can be more easily separated. One of the key advantages of kernel methods is that they can work with non-linearly separable data. They are based on the principle that a decision boundary in a high-dimensional space can be represented as a linear boundary in a lowerdimensional space. This allows kernel methods to effectively handle complex, nonlinear relationships between the input features and the output variable. The present section will proceed by presenting a short background on the crucial idea behind KAF techniques. The reproducing kernel theory [49] has allowed many adaptive algorithms to evolve. The suitability of these approaches for training is founded on the concept of learning by error correction. For the purpose of implementing the kernel methods over a measurement data input U, we only need to obtain kernel function values for each pair of the measurement data input. Usually, these values are stored in a square matrix named the kernel matrix (or the Gram matrix). Definition 1. The Gram matrix, also known as the kernel matrix, is a matrix that summarizes the pairwise similarities between a set of data points using a kernel function. Specifically, given a set of N input data points u 1 , u 2 , . . . , u N , the Gram matrix K is an N × N matrix whose (i, j)-th entry is defined as: where κ is the kernel function.
The Gram matrix is important in kernel methods, such as kernelized support vector machines, as it allows us to compute the dot products of data points in the feature space implicitly defined by the kernel function, without explicitly computing the feature vectors themselves. This is because the dot product of two feature vectors in the feature space can be expressed as a function of the corresponding entries of the Gram matrix.
In the remainder of the document we denote, by κ, the reproducing kernel and by U the input space. Like the inner product, we can also expect the kernel function to be positive definite.

Definition 2.
A positive definite kernel is a function that takes pairs of inputs and produces a measure of their similarity. More formally, a kernel function κ is positive definite if for any finite set of inputs u 1 , u 2 , . . . , u N , the corresponding kernel matrix K ij = κ(u i , u j ) is positive definite, meaning that its eigenvalues are all non-negative: In machine learning, positive definite kernels are commonly used in kernel methods, such as support vector machines and kernelized ridge regression [50]. They allow us to implicitly map the input data into a high-dimensional feature space, where linear methods can be applied even when the data is not linearly separable. Positive definite kernels have a number of desirable properties, including being symmetric and positive semi-definite, and they can be used to define a notion of distance between data points that takes into account their similarity in the feature space.
Common examples of positive definite kernels include the Symmetric and positive definite scalar functions, often referred to simply as "kernel", are more precisely "Mercer kernel". This expression comes from what is called "Mercer's theorem".

Theorem 1.
Mercer's theorem is a fundamental result in kernel methods which provides the necessary and sufficient condition for a function to be a valid positive definite kernel function. Specifically, Mercer's theorem states that a continuous and symmetric function κ(u, u ′ ) on a compact domain is a valid positive definite kernel function if and only if it can be expressed as: where ζ i and Φ i are the eigenvalues and eigenfunctions of the integral operator T κ : L 2 (U) → L 2 (U), defined by: 11) and the series converges absolutely and uniformly.
In simpler terms, Mercer's theorem says that a function κ(u, u ′ ) is a valid positive definite kernel function if and only if it can be written as a weighted sum of inner products of feature functions Φ i (u) and Φ i (u ′ ), with non-negative weights ζ i .
Mercer's theorem is important in kernel methods because it provides a way to construct valid kernel functions for a given problem, by finding a suitable set of feature functions and their corresponding weights that satisfy the conditions of the theorem. It also provides a theoretical foundation for the effectiveness of kernel methods, by showing that any positive definite kernel function can be used to implicitly map the input data into a high-dimensional feature space, where linear methods can be applied effectively even when the data is not linearly separable in the original space.
In reproducing kernel Hilbert space, the mapping representation Φ i would be made as follows: As a means of representing the components of H from their coordinates, a Hilbert space H must be given an orthonormal base. The assigned kernel of this should be a symmetric, continuous, positive definite function κ : U × U −→ R, normalized, where U ⊂ R N is a compacted subset.

Kernel Recursive Projection Identification (KRPI)
In this section, we will introduce the proposed algorithm. The general concept is to operate the algorithm introduced in [26] in the feature space of the kernel that connects to a positive definite kernel κ, under the characteristic mapping Φ(.) as defined in Eq. (13). The sample sequence is transformed by using a feature map Φ: In order to establish the model of Hilbert spaces with a reproducing kernel, we have decided to use the (Gaussian) radial basis function (RBF) kernel that is, ideally, a predefined se-lection due to its general-purpose property of approximation and numeric stability. The reason for choosing the Gaussian kernel function is that it is a convex optimization problem. We have done a convex subset on the solution provider space while using a Gaussian function, and we establish a norm on a vector space when we use the Mercer kernel. The optimization problem has almost one optimal point, since it is a strictly convex Gaussian kernel: where σ > 0 represents the smoothing parameter. Figure 2 illustrates the mapping of the data space U to the Hilbert space H obtained by the Gaussian reproducing kernel κ. Fig. 2. Define a characteristic map.
The block diagram shown in Fig. 3 illustrates the adaptive kernel-based channel estimation using the proposed algorithm. The system inputs and outputs, u(k), s(k) are considered measurable. The proposed KRPI algorithm is defined by the following steps: -Step 1. Transformation of measured data. As the initial step, in order to create input data, the observation data space X is transformed into a nonlinear Hilbert space H: -Step 2. Application of the RP algorithm methodology. The Guo et al. algorithm is used in the second step to minimize the cost function by applying its logic to the data sequence generated in the first step that is formulated in Eq. (15): where the term ω represents the weight vector in reproducing kernel Hilbert space H.
-Step 3. Determination of the reproducing kernel in Hilbert space. In this step, we proceed directly in the feature space H, assuming that our data have already been successfully modeled in RKHS using the mapping function Φ, i.e.: The kernel recursive projection identification algorithm for estimating the parameter vector ω is summarized in Algorithm 1.

Initialization:
Initialize weight vector ω(0). Select the kernel bandwidth σ, constant scalar β, data length N , and the threshold C. Computation:

5) Compute the estimation of s(k) as:
6) Update weight vector:

Simulations
We proceed in this paragraph with Monte Carlo simulations to investigate the performance of the proposed KRPI algorithm in nonlinear system identification (NSI) with binary-valued output observations (Fig. 1) under different data inputs. We compare the performance of KRPI with that of RP in finite impulse response identification. Their parameters are set as: KRPI (C = 0.5, β = 1000, σ = 0.5), and RP (C = 0.5, β = 1000). It is worth noting that if we change one of these simulation parameters, the others, remain unchanged. The mean square error (MSE) of the estimated impulse response parameters is chosen as a specific metric to evaluate the performance of these algorithms, which is defined here as follows: where ω(i) andω(i) ) represent the measured and estimated impulse response parameters in each iteration, respectively.
All simulations were implemented using Matlab software.
To examine the strength of the measurement noise for each algorithm, we define the signal-to-noise ratio (SNR) by the following relationship: where E[.] is the mathematical expectation.
For all simulations, we considered the function f (.) (see Fig. 1) as a hyperbolic tangent function (tgh). It is an indefinitely differentiable function that realizes a bijection of R on [−1, 1]: Since we have a non-linear system with binary output s(k) = 1 or −1, it corresponds to the asymptotes of the hyperbolic tangent function: The physical interpretation of the hyperbolic tangent function in this context is that it introduces a nonlinear relationship between the input and output signals of the system.
The Figs. 4 and 5 show the input and output signals of the nonlinear system identification with binary-valued output observations (Fig. 1). The bottom graphs show the complete signal form of data length N = 1000 and the top graphs are zoomed in to between 400 to 500 data lengths to give more accurate details of the processed signals.

ETSI BRAN Mobile Radio Channels
The broadband radio access networks (BRAN) channel is a type of a wireless channel that is used in broadband wireless communication systems, such as WiMAX and 4G LTE. The BRAN channel is characterized by its wide bandwidth, typically spanning several hundred megahertz, and its high data rates, which can be several tens of megabits per second.
In this subsection, we apply the algorithms presented in NSI to assess the benefit of the proposed KRPI algorithm in identifying the parameters of ETSI BRAN mobile radio channels: BRAN B. This model is suggested for use in large open areas and typical indoor scenarios with considerable delays under non-line-of-sight (NLOS) propagation settings. The details about the ETSI BRAN mobile radio channels can be found in [51], [52].
The ETSI BRAN radio channels' impulse response is described by: where δ(n), τ i and ω i ∈ N (0, 1) denote, the Dirac function, the path i time delay and path i magnitude, respectively.  Fig. 6, for a data length of N fixed at 3000, with SN R = 20 dB and 50 Monte Carlo iterations, we have plotted the parameters of ETSI BRAN B channel estimated as a function of the path delays using the two algorithms. From this figure, we noticed that the proposed KRPI algorithm gives the best performance because the estimated parameters of the ETSI BRAN B channel impulse response are closer to the true one. Meanwhile, for the RP algorithm, we have a difference between the estimated and measured impulse response.

Magnitude and Phase Estimation
Using the proposed kernel recursive projection identification (KRPI) and recursive projection (RP) algorithms, we have identified, in this paragraph, the amplitude and phase of the ETSI BRAN B channel impulse response. Figure 7 represents the estimation of the ETSI BRAN B magnitude and phase for an SN R = 20 dB and a data length of N = 3000. As shown in this figure, the proposed algorithm offers a very accurate estimation of the amplitude and phase compared to the RP algorithm. Also, in the same figure (Fig. 7), we have more parameters which are the same as those measured for ETSI BRAN B in the case when we apply the proposed KRPI algorithm. The evident difference between the estimated amplitude and phase patterns and the real model is clearly seen when using the RP algorithm.   Figure 8 illustrates the results of the estimation of the ETSI BRAN B magnitude and phase by employing the proposed KRPI algorithm for a fixed signal-to-noise ratio SNR of 20 dB and various data inputs N . Figure 9 demonstrates that for the data length of N = 2500, estimations of the ETSI BRAN B magnitude and phase are very close to the true ones and they remain in complete agreement. Regarding the impulse response of ETSI BRAN B channel for one data length of N = 1500 the estimated phase of the ETSI BRAN B channel is closer to the true value, but the estimated magnitude follows the real model with minor differences. In unfavorable conditions, i.e. small data sample (N = 500), the performance of the proposed algorithm degrades during magnitude and phase estimation. To summarize, the data length N has a significant impact on the estimated phase, but only a minor impact on the estimated magnitude. The proposed algorithm needs, therefore, a large data sample to obtain an accurate approximation of the real values.

Macchi Channel
We adopted the Macchi channel to evaluate the theoretical outcome of the proposed algorithm. The Macchi channel, also known as the Macchi-Hall channel, is a quantum communication channel that models the transmission of quantum information between two parties in the presence of a noisy environment [53]. The impulse response of this channel is described by the following model: In the Macchi channel, the sender prepares a qubit in a particular state and then transmits it to the receiver through a noisy quantum channel that introduces errors and disturbances to the qubit state. The noisy channel is modeled as a sequence of independent and identically distributed (i.i.d.) quantum operations, with each of them being capable of causing a certain type of error.

Impulse Response Parameter Estimation
Under the conditions of SN R = 20 dB and N = 3000, Fig. 9 illustrates the estimations of Macchi's channel impulse response parameters using the two algorithms presented previously. This comparison proves that the proposed KRPI algorithm can be successfully used for the identification of the Macchi channel impulse response parameters. The average estimation values are close to the real model. This could be due to the nonlinear nature of the system which we need to identify, and the proposed KRPI algorithm effectively employs a linear model in a high-dimensional nonlinear space, which is equivalent to applying a nonlinear technique in the original space. In the opposite situation, the RP algorithm is a linear method.

Magnitude and Phase Estimation
In this subsection, the Macchi channel impulse response magnitude and phase will be estimated. Figure 10 repre-sents the estimation of the Macchi channel's magnitude and phase using the KRPI and RP algorithms for a data length of N = 3000 and for an SN R = 20 dB. Based on this result, we can conclude that, in comparison to the RP algorithm, the proposed KRPI algorithm is very important, since it offers approximately the same amplitude and phase (estimated and measured). Using the RP algorithm, we obtain a large difference between the estimated and measured phases.   Figure 11 shows the average values of the estimated parameters using the proposed KRPI algorithm. We can see that, in a very noisy environment (lowest SN R = 0 dB), the Gaussian noise influenced the estimated parameters of the model, as we see a significant difference between the estimated values of the amplitude and phase and the measured data, and a slight influence of the noise in the estimation of the parameters of the impulse response, principally if the variance of the noise is small (e.g. when we take SN R = 10 dB). If SN R = 20 dB, the estimated magnitude and phase are very close to the measurement and in perfect accordance. According to this result, we can notice that the performance of the proposed KRPI algorithm in a very noisy environment (SNR tends towards 0 dB) is acceptable compared to the RP algorithm.

Mean Square Error Criterion
In this subsection, the performance of the KRPI and RP methods is compared, where SNR ranges from 0 dB to 30 dB. The mean square error (MSE) results are reported in Tables 2  and 3. As presented in these tables, the use of both presented algorithms shows that MSE decreases if SNR increases for a fixed N . For the two studied channels (Macchi and ETSI BRAN B), the proposed KRPI algorithm ensures better performance convergence than the RP algorithm for all SNR values, even in a high noise environment (SN R = 0 dB), due to the fact that MSE values of the proposed KRPI are much lower than those obtained by the RP algorithm. For example, in the case of the ETSI BRAN B impulse response channel, the MSE value achieved by the proposed KRPI algorithm amounts to only 9.44% and 6.6% of the MSE value obtained using the RP algorithm, when the SNR is equal to 0 dB and 10 dB, respectively, and represents 20.56% and 0.68% of the MSE value obtained by the RP algorithm in the case of the Macchi impulse response channel for SN R = 20 dB and 30 dB, respectively.

Conclusion and Future Work
In this paper, we proposed a novel kernel method for singleinput single-output (SISO) nonlinear system identification with binary-valued observations. This approach employs a kernel function to perform an implicit mapping of the data using the kernel trick. This kernel trick-based approach implicitly maps the original measured data input into an infinite (or high)-dimensional nonlinear space. The method was used to estimate the impulse response parameters of ETSI BRAN B and Macchi channels. Note that based on the numerical simulation results, we have obtained good results, compared with the recursive projection (RP) algorithm. It was demonstrated that the proposed KRPI algorithm is capable of estimating the parameters of the finite impulse response system with a good level of accuracy, higher than that achieved by the RP algorithm whose performance deteriorates considerably. As far as channel impulse response identification is concerned, the proposed KRPI algorithm is effective and efficient in identifying the amplitude and phase of the channels (ETSI BRAN B and Macchi) with various SNR values.
The future work will focus on the stability and convergence of our solution in the context of non-line-of-sight outdoor channel identification for wireless sensor networks based on impulse ultra-wide band radio.