Improved Association Rule Mining-Based Data Sanitization for Privacy Preservation Model in Cloud

 Data security in cloud services is achieved by imposing a broad range of privacy settings and restrictions. However, the different security techniques used fail to eliminate the hazard of serious data leakage, information loss and other vulnerabilities. Therefore, better security policy requirements are necessary to ensure acceptable data protection levels in the cloud. The two procedures presented in this paper are intended to build a new cloud data security method. Here, sensitive data stored in big datasets is protected from abuse via the data sanitization procedure relying on an improved apriori approach to clean the data. The main objective in this case is to generate a key using an optimization technique known as Corona-integrated Archimedes Optimization with Tent Map Estimation (CIAO-TME). Such a technique deals with both restoration and sanitization of data. The problem of optimizing the data preservation ratio (IPR), the hiding ratio (HR), and the degree of modification (DOM) is formulated and researched as well.


Introduction
Cloud computing (CC) creates a massive virtualized data resource pool with various services connecting a huge number of resources [1]- [3] to achieve the required levels of portability and reliability.The three fundamental service delivery models include the following: infrastructure as a service, platform as a service, and software as a service, and the four development trends observed in the world of CC are public cloud, private cloud, cloud platform, public clouds, and virtual private cloud [4]- [6].IaaS is a service-managed, computer hardware and fixed network provisioning concept with significant expansion capabilities.PaaS is a service paradigm that uses the middleware of the service model to offer integrated development environments, frameworks, applications, and development tools [7]- [9], while SaaS is a category of remote computing services.All three technologies are used in cloud computing.However, because users are not aware of the sources or owners of specific resources in such a structure, it is more challenging to protect such resources and services against attacks.The task is even more complicated, because many companies owning and running community groups, as well as outside stakeholders, operate in the capacity of cloud administrators.To benefit from the advantages of multiple cloud deploy-ment methodologies, a hybrid cloud concept is introduced which integrates two or more clouds.Private virtual cloud is the term used to describe the common pool of resources in a cloud system [10]- [12].
Security systems, key distribution, encrypting, access, identity authentication, audit scheduling, as well as human and physical access control are some of the security issues affecting cloud data [13], [14], as defined by several privacy-protection strategies [15]- [17].Improved data security levels have been achieved in the cloud through the development of a privacyconscious access control system.Researchers have proposed a unique method for combining spectral band handprint images, depending upon the complex dual-tree transform as well as a feature extraction and minimization method based on the Gabor wavelet transform (GWT) and principal component analysis (PCA).
With such an environment taken into consideration, this article offers the following contributions: -it employs a modified apriori approach to sanitize system information, -it proposed the CIAO-TME method to generate the best key while adhering to the degree of modification (DOM), information preservation ratio (IPR), and hiding ratio (HR) requirements.The rest of the paper is as organized follows.Section 2 evaluates the existing literature.The suggested model and its features are described in Section 3. Section 4 covers key generation processes and the data sanitization method.Results and discussion are given while Section 5 while conclusions are presented in Section 6.

Literature Review
Danish et al. [18] formulated their approach to privacy protection in the cloud environment by employing artificial intelligence (AI).The authors stated that AI abilities were helping companies achieve greater productivity in the corporate cloud environment.Data normalization and recovery were the two key steps of the recommended privacy-preserving system.The extraction of many co-functions, including such factors as DOM, HR, and IPR, leads to the generation of an optimal key.The study demonstrated the effectiveness of the suggested model in improving cloud security when compared to other methods.
Avijit and Radha in [19] employed the honeypot algorithm for data protection or IDS, which was a good strategy for predictions and privacy protection.The dataset was initially standardized using the normalization approach, a process which involves replacing missing values and removing unnecessary data.Following that, unique features were extracted and the best models were chosen using the GLCM algorithm.Predicting the target was done using a unique CNN classifier that offers high attack detection accuracy levels.The developed algorithm was used to protect information from infiltration and other assaults.In addition, a cryptographic mechanism was utilized to ensure the required secrecy protection, while encryption was performed using HPA.If the data holder requests a specific file, the cloud server generates a key and verifies it by interacting with the user for authentication purposes.The performance of the solution was evaluated and was compared with that of other, existing methodologies to demonstrate efficacy of the proposed scheme.
Tehsin et al. [20] investigated the privacy-preserving authorization paradigm for the cloud and privacy-preserving strategies for cloud-based EHRs using defined taxonomy.Inner login control and outer security in outsourced system design for hybrid cloud have been formulated and then the PPX-AC algorithm was developed, combining fine login control with the multifunctional use of EHRs and a cutting-edge privacy mechanism.Using HLPN, the authors confirmed the efficiency of the proposed PPX-AC by invalidating known privacy threats.Furthermore, the described model demonstrates its efficacy and multifunctional application possibilities.Tian et al. [21] proposed an IM-based methodology for using MIQP to solve the energy management problem.The viability and efficiency of using IM in MIQP were demonstrated by incorporating cloud-edge architecture.To better fit real applications, the general criteria of the MIQP IM were expanded in terms of security and implementation cost.
Luis et al. in [22] presented smart CAMPP data to achieve cloud authorization.Format-conserving encoding methods were used to outsource them discreetly.Furthermore, the observations demonstrated the applicability of the proposed technique, allowing to expect high accuracy levels.In contrast to a method that does not improve security, the authors' proposal has no substantial influence on encryption.
Pan in [23] presented various cloud privacy security vulnerabilities and then proposed a complete privacy security prevention architecture.The characteristics of several techniques were also compared, including network access tools, CP-ABE, KP-ABE, the fine-grain, polynomial number of authority, dismissal mechanism, the detect mechanism, PRE, various leveled encryption, and a mixture of other methods.
Chen et al. in [24] designed a lightweight encryption system that maintains an acceptable usefulness model while ensuring evidential privacy preservation.Using the specified prototype system, the recommended approach was deemed secure against a sincere but inquisitive host and a catastrophic collision.The effectiveness of the method was examined and compared to similar solutions using the MNIST and UCI human action recognition database.This strategy decreased the runtime by 20% and the communicated cipher text length by 85%, on average, while maintaining the accuracy of competitive SMC methods.
Yong et al. [14] presented a blockchain-based EHR sharing mechanism ensuring both secure and private features by employing encryption algorithms.Additionally, evidence of permission was intended to serve as the consensus protocol for consortium blockchains in order to guarantee the software's reliability.According to a study, the proposed protocol meets the security goals and has a good computational efficiency.

It is an algorithm with fewer steps and strong global findings
When it comes to ensuring the security of every database, the strategies OI-CSA and BS-WOA strategies offer low convergence performance Avijit and Radha [19] Honeypot method Due to efficient implementation of the system, the security rate of online services has been improved It is only capable of detecting direct assaults Tehsin et al. [20]

PPX-AC model
Enables fine-grained access control while maintaining privacy When data is sent between parties, it is essential to make sure that it is secure, as the parties involved were unaware of the information shared between both original parties Tian et al. [21] MIQP algorithm Is an optimum solution with good accuracy, while lowering the computational cost Preserving privacy while decreasing computing costs to the highest possible extend is hard to achieve Luis et al. [ EHR method Data sharing method among medical institutions were identified The developer fails to provide timely updates Distributed locations with cloud processing infrastructures and mass data storage create privacy concerns.For instance, Google's cloud servers are spread out all over the world, including seven sites in the Americas, two in Asia, and three in Europe.Additionally, customers must be aware where the cloud hosting is located, because privacy defining laws vary in many countries.This shows the importance of cyber security aspects in cloud computing.
Many algorithms were introduced to solve the privacy preservation problem, but there is no optimal solution yet.For example the J-SSO algorithm [18] offers a poor convergence rate that causes failures in this sort of applications.Similarly, such algorithms as the honeypot method [19] are capable only of detecting direct assaults, while the smart CAMPP algorithm [23] often creates challenges, as no previous efforts has focused on motion sensors.Moreover, such encryption standards as KP-ABE and CP-ABE also need some improvement in the degree of privacy provided.Therefore, there is still a need of researching the advanced privacy preservation models.

Proposed Method Concerning Data Security in Cloud
This work proposes two procedures that attempt to develop a new cloud data security method: data sanitization and restoration process.The data sanitization process sustains the security of sensitive data in the cloud by concealing it from unauthorized users and preventing it from being accessed.In this scenario, data is sanitized using an upgraded version of the a priori algorithm, then data restoration is incorporated for restoring or recovering the sanitized data.In both processes, key generation plays a very important role and should be performed optimally to guarantee data safety.Here, the recommended CIAO-TME technique is used to identify the best key for both data restoration and sanitization sages.Considering this to be an optimization problem, DOM, IPR, and HR-related objectives are used for obtaining the key.
Figure 1 shows the diagram of the concept proposed for ensuring preservation of data security in the cloud.

Data Sanitization
The association rule mining (ARM) approach is enhanced by using the apriori algorithm containing the following three steps: 1) Scan transaction database once and get the sampling method for each item.During the sampling procedure, the algorithm picks a random sample S from the database da and then searches for frequent item sets S.This can be reduced by lowering the so-called min-support.The  impact factor (IF) is found for removing the victim item set.It is fixed with a threshold value α min such as: -if α min or IF is 1 or more, the item set should be continued (considered as sensitive data), -if α min or IF equals 0 or less, the item set is considered as victim item set and it should be removed.
2) Using the overlap strategy for counting the support of candidate item set c k , the sampling sets of L k−1 and of L 1 are created.
3) If the |L k | k, the algorithm is terminated.
By using the apriori model, the rules for sanitization are created and, similarly, the reverse rules are designed.Next, using the XOR function with the key values, the adopted data sanitization process is performed.

Key Generation
One of the important and computational time-consuming steps in the creation of a security mechanism is key generation.The creation of uncrackable and non-derivable secure keys is a complicated computational task.Creation of the key matrix and the initial cloud data are essential for preparing sanitized data, such as: where an optimally produced key is referred to as Ky 2 , d S is the original data and d ′ S is the sanitized data.Generating the key is a part of the CIAO-TME model.

Restoration Procedure
Data sanitization is a technique used for concealing private or sensitive information in a cloud with the goal of preventing unwanted data leakages.Using the special key created during the data cleaning procedure, sensitive data is revealed during data restoration.The opposite is the data restoration process.The same key that the created the CIAO-TME model for the purpose of generating sanitized data is used to recover the original information: where d S denotes the recovered data.

Novel CIAO-TME Optimal Key Selection
To identify the optimal key selection method, first the objective function is formulated as: where HR, DOM, and IPR are the objective functions considered for data sanitization.Figure 2 presents the solution encoding scheme.HR is the percentage of sensitive items that are properly concealed by d ′ S : where tp is a total number of hidden data indexes, and N d represents the size of non-zero indexes.IPR is the inverse of data lost and the rate of non-sensitive rules not concealing in sanitized data set.It is formulated as: where tp is the total number of saved data indexes and N 2 equals the total amount of zero indexes.
The original dataset d S and the sanitized dataset d ′ S allow to determine the DOM function that is the Euclidean distance between d S and d ′ S : where d S indicates the degree of modification taking place within the unique dataset and d ′ S denotes the sanitized dataset d ′ .

Proposed CIAO-TME Model
The proposed CIAO-TME model is a hybrid optimization approach and is created by combining the traditional AOA [25] and CHIO techniques [26].AOA is a high-performance optimization technique as far as convergence time and explorationexploitation balance are concerned.The advantages of CHIO include high effectiveness when dealing with a large number of optimization issues across a wide range of optimization domains.To overcome the limitations of AOA and CHIO models, the hybrid optimization CIAO-TME model is proposed.It combines two common optimization models to speed up the convergence process offered by solutions [27]- [30].The individual steps of the model are presented below.
In the first step, a population of N the search agents is initialized, and next the positions of the search agent and other algorithmic parameters are initialized: where LB I and U B I denote the lower and upper bounds of I-th the search agent, respectively.
Then, density Den I and volume V ol I of the search agents are set randomly, and acceleration A I of the I-th the search agent is assigned as: The initial population is then evaluated and the search agent characterized by the best fitness levels is selected as: a best , V ol best , and D best .For the next position, density Den I t+1 and volume V ol I t+1 are determined by: where Den best and V ol best points out to the best density and best volume, respectively.The Rand function points out to a random value with uniformly distribution.The transfer operator T F is modeled as: where itr and max itr are the current and maximum iterations, respectively.
Step 2 is the exploitation phase.If T F 0.5, then there occurs no collision between the searching agents.The acceleration factor is modeled as: where Den mr , V ol mr , and A mr denote respectively density, volume, and acceleration of random search agents, respectively.
If T F > 0.5, the position of the search agent is updated as: In CIAO-TME, the position is updated by combining the concepts of AOA and CHIO as: where λ refers to the force of infection, τ is the transmission rate, and M refers to the birth rate percentage of individuals who are added to the entire population of M , A I−N orm t+1 is the normalized acceleration, C2 = 6, T = C3 • T F and C3 = 0.3 are the constant values, while F is the flag that shows the direction.In addition, r points out the random value that is generated by means of the tent map to improve the convergence rate.
Step 3 defines the proposed exploration phase.When T F > 0.5, then there is no collision and the acceleration of the search agent itr + 1 is computed as: For CIAO-TME, the acceleration factor is computed using on sample variance S 2 and weight we i as: For T F 0.5, the I-th search agent itr + 1 is computed as: As a result, the algorithm in step 4 returns the best solution found.

Results and Discussion
The proposed CIAO-TME data security model was implemented in Python using a sample dataset downloaded from [31].The databases known as Cleveland, Hungary, Switzerland, and VA Long Beach were used.The accessible model was evaluated over AOA, CHIO, HBA, BES, and BMO for a variety of metrics, such as DOM, HR, IPR, and cost.

Convergence Analysis
Figure 3 shows the cost factors for three datasets (D1, D2, and D3), illustrating convergence of the implemented CIAO-TME and comparing it with the traditional schemes: AOA, CHIO, HBA, BES, and BMO.CHIO and AOA have revealed poor performance by acquiring more expensive assets in the first, second, and third scenarios.From the 12-th to the 50-th iteration, CIAO-TME achieves a minimum convergence of 0.001 for all scenarios and slightly higher cost values before the 10-th iteration.(450) for D3 and a relatively low IPR for D1 and D2, when compared to D3.The outcomes of using CIAO-TME are better when compared with AOA, CHIO, HBA, BES, and BMO models, especially with regard to the D3 dataset, due to the optimal generation of the key and an improved aprioribased ARM.

Fitness Analysis
Figure 5 shows the fitness examination defined by Eq. ( 3) for the CIAO-TME scheme over AOA, CHIO, HBA, BES, and BMO for D1, D2, and D3 datasets.The objective is to minimize the HR function while keeping IPR high and DOM at a relatively low levels.Here, the minimal outcomes are achieved by CIAO-TME at D3.For D1 and D2, the CIAO-TME model has attained the best outcomes in terms of the objective.However, compared to D1, D2, D3-related outcomes were the best.This is the result of optimal generation of the key and an improved apriori-based ARM.

Comparison Summary
The proposed CIAO-TME cloud data security scheme is compared with existing ARM schemes for different metrics (Table 3).CIAO-TME-based security models are tested in terms of performance fitness at high HR, high IPR, and low DOM.An HR of 0.68 is achieved for CIAO-TME, i.e. a result that is better than the one obtained when using ARM or an approach with no CIAO-TME involved.The best results were achieved on D1 and D2 datasets.The merging of optimal key generation and the improved apriori-based ARM additionally enhanced the CIAO-TME scheme.

Analysis of Existing Works
For three test instances, Table 4 shows the comparison of the selected CIAO-TME with other existing schemes, such as GMGW [31] and J-SSO [18].The adopted CIAO-TME-based security method is evaluated in terms of such metrics as fitness, HR, IPR, and DOM, with the lowest DOM (∼ 0.48) achieved for D3.This result is superior to that of other currently used techniques.The D2 and D3 datasets allowed to obtained an improved HR (∼1.0) by using CIAO-TME, and for the D1 dataset, the proposed solution achieved the lowest HR (∼0.68).Additionally, the IPR on D3 was higher (∼458), while for D1 and D2 IPR it was relatively low.Overall, the CIAO-TME produces superior results than AOA, CHIO, HBA, BES, and BMO models.

Attack Analysis
Tables 5-6 present the examination results for three datasets under simulated CPA and KPA attacks.A CPA is a cryptanalysis attack paradigm that assumes the attacker has access to the cypher texts for any plain.The KPA is a cryptanalysis attack type in which the attacker has access to both plaintext (also known as a crib) and encrypted version of the data (cipher text).These can be used to divulge additional classified information, including security codes and private keys.CIAO-TME attained minimal KPA attack and CPA attack vulnerability values versus AOA, CHIO, HBA, BES, and BMO for all datasets.Specifically, the third datasets revealed lower attack parameter values for the CPA, while for the KPA, dataset D1 is characterized by lower attack values than datasets D2 and D3.This significant improvement stems from due the enhanced apriori-based ARM and optimized key generation.

Conclusion
This work proposes two procedures to protect sensitive data from unauthorized access and users by relying on the data sanitization process.The modified apriori approach is used to clean the data and the major goal is to generate the best keya task accomplished by employing an optimization method.
A reversible method, known as data restoration, allows to retrieve or obtaining cleaned content.IPR, HR, and DOM were the objectives set to tackle the optimization problem.In the simulations, the CIAO-TME scheme achieved the lowest DOM for the D3 dataset and overperformed other techniques that are currently in use, such as J-SSO and GMGW.The D2 and D3 datasets improved HR to approx.1.0 with the CIAO-TME scheme.The CIAO-TME also achieved the lowest HR (0.68) on the D1 dataset.

Fig. 1 .
Fig. 1.The proposed concept ensuring data security in the cloud.

Fig. 2 .
Fig. 2. The proposed data security in the cloud.

Figure 4 Fig. 4 .
Figure4illustrates the analysis of DOM, HR, and IPR parameters for the CIAO-TME method over known techniques, for D1, D2, and D3 datasets.The proposed CIAO-TME scheme achieved the lowest DOM (∼0.45) -a result that better than the one characterizing other schemes, such as AOA, CHIO, HBA, BES, and BMO.Additionally, CIAO-TME achieved a higher HR (∼1.0) for D2 and D3 and the lowest HR for D1, when compared to D2 and D3.Moreover, the proposed CIAO-TME scheme obtained a better IPR

Table 2
summarizes the research on recent cloud data security methods.Summary on research in existing papers.

Metrics CIAO-TME Proposed with Proposed with existing ARM no CIAO-TME
Tab. 5. CPA analysis results.KPA analysis of the proposed and other known schemes.