## Udita Paul , Jiamo Liu , Sebastian Troia , Olabisi Falowo and Guido Maier## |

Symbol | Description |
---|---|

[TeX:] $$B$$ | Set of base station coordinates |

[TeX:] $$b$$ | Total number of base stations |

[TeX:] $$K$$ | K-means centroid coordinates |

[TeX:] $$k$$ | Total number of clusters |

[TeX:] $$O$$ | Overall aggregated distance |

[TeX:] $$d$$ | Distortion value |

[TeX:] $$Y$$ | Established location of Data center |

[TeX:] $$C$$ | Base station cluster |

[TeX:] $$w_{i}z$$ | Weight of the [TeX:] $$i^{th}$$ base station in zone [TeX:] $$z$$ |

[TeX:] $$q_{z}$$ | Total number of base stations in zone [TeX:] $$z$$ |

[TeX:] $$A_{z}$$ | Set of volumes of traffic of each station in zone [TeX:] $$z$$ |

[TeX:] $$v_{iz}$$ | Volume of traffic of the [TeX:] $$i^{th}$$ base station in zone [TeX:] $$z$$ |

[TeX:] $$x_{jz},y_{jz}$$ | Geographical coordinates of the [TeX:] $$j^{th}$$ candidate location of the data center in zone [TeX:] $$z$$ |

[TeX:] $$d_{oj}$$ | Distance between the candidate location of the data center in a zone to a point |

[TeX:] $$V_{z}$$ | Design Capacity of the data center in zone [TeX:] $$z$$ |

[TeX:] $$D_{z}$$ | A capacity multiplier between [1, 2] for the data center in zone [TeX:] $$z$$ |

[TeX:] $$f(std)_{z}$$ | Corresponding standard deviation value of the hour which has maximum sum of mean and one standard deviation in zone [TeX:] $$z$$ |

[TeX:] $$g(mean)_{z}$$ | Corresponding mean value of the hour which has maximum sum of mean and one standard deviation in zone [TeX:] $$z$$ |

[TeX:] $$P_{h}(mean+1std)_{z}$$ | Probability of traffic volume of the [TeX:] $$h^{th}$$ hour in zone [TeX:] $$z$$ to be less than the sum of mean and on standard deviation of the [TeX:] $$h^{th}$$ hour |

[TeX:] $$\alpha$$ | Capacity design constant, −0.6 |

[TeX:] $$P_{s}$$ | Capacity design constant, 0.6 |

After reviewing the existing research works so far conducted in this area, we can observe that they do not focus on the design of data centers using real world cellular traffic. As suggested in the literature, adaptive utilization and placement of cloud computing resources and functionalities are essential to provide optimal services for the end users. As traffic pattern changes with space and time, it is therefore crucial to determine the traffic profile that exists in various regions within a geographical area. With proper analysis and forecast of traffic patterns in different locations, data centers can be optimally placed and their resources can be properly utilized. To the best of our knowledge, no existing literature has studied the TIM data set in this context, which is the main focus of our work. The symbols used throughout the paper have been listed in Table I.

In this section, we first present the dataset utilized for the purpose of our work. We then present the algorithm used in forming various regions within the city to facilitate our design of data centers.

The data used in this work was released by Telecom Italia in 2015 and has been made available for public use [41]. It contains call detail records (CDRs) of different areas over the period of November 1, 2013 to January 1, 2014 within the Italian city of Milan and Province of Trento. The data set breaks down the considered area into 10,000 square cells (geographically located), with each cell representing an area of 235m by 235m. The CDRs contain information related to each cell’s different telecommunication activities such as number of calls, sms and internet activity within a ten minute period. As internet activities certainly demand more resources in today’s wireless communication, for the purpose of this work, we only consider the amount of internet activities that occur within a cell in a given time frame. As the data set also contains 62 days worth of records, we separated the holidays (22 days) from the working days (40 days). Another important attribute of this data set is in its recording of the CDRs. Each entry of the internet activity in the dataset represents the number of times a connection is initiated or terminated. A new CDR is also registered every time a previous connection exceeds 5 Megabytes (MB) of traffic. Since we can neither identify exact number of new and terminated internet connections nor the exact volume of traffic exchanged within a connection, we, therefore, assume half of the records to be new connections with each connection having a volume of 5 MB of internet traffic. The cells of the TIM data set are presented in Fig. 2.

This dataset does not reveal information about the geographical positions of the base stations in the Milan area. Therefore, we processed another dataset [42] that collects information about the base stations of Telecom Italia deployed in Milan. The results obtained from the analysis of this data set was used to match the traffic volume information provided in the TIM data set. By matching these two data sets we were able to obtain further information such as: the total number of TIM base stations in the Milan grid (2554), geographical coordinates of each base station and hourly number of CDRs experienced by each base station during the time period the dataset was formulated [18].

In order to analyse the mobile internet traffic that different areas in Milan experience, it is critical to divide the city into different zones. The division of zones needs to take into consideration the distribution of base stations within the city. Certain number of base stations need to be clustered to form a single zone. To achieve this clustering of base stations to create different zones within the city, we employ a popular clustering method known as the K-means clustering algorithm.

In this work, the objective of the K-means algorithm is to determine [TeX:] $$k$$ number of centroids which are to be associated with a certain number of members which would result in least overall aggregated distance. Depending on the locations of the members, the positions of these centroids and their members are changed iteratively to obtain the best possible location for each centroid. The process is repeated until the membership of each centroid remains unchanged from previous iteration, resulting in the algorithm to converge to the local minima of the overall aggregated distance.

In our case, given a set of base stations [TeX:] $$B$$ containing the [TeX:] $$b$$ (2554) pairs of geographical coordinates (in form of latitude([TeX:] $$x$$) and longitude ([TeX:] $$y$$)), we can assign [TeX:] $$K$$ as the set containing the geographical locations of the [TeX:] $$k$$ clusters’ centroids. These sets can be represented as:

The overall aggregated distance, [TeX:] $$O$$, associated with [TeX:] $$k$$ centroids and their members can be represented as:

In order to determine the optimal number of clusters [TeX:] $$k$$, we experimented with different values of [TeX:] $$k$$. For each value of [TeX:] $$k$$, we obtained a distortion value, [TeX:] $$d$$, which is defined by [TeX:] $$d=O / k$$. The relationship is demonstrated in Fig. 3.

From Fig. 3, it can be seen that even as the value of [TeX:] $$k$$ increases beyond 20, the value of [TeX:] $$d$$ does not decrease significantly. As such, we use 20 as the value of [TeX:] $$k$$ and thereby divide the base stations in the city to form 20 zones. For each of these 20 zones, we propose establishment of a data center to process the traffic that the base stations within a zone experiences. Fig. 4 shows the locations of the base stations within different zones in Milan. Entries of similar color and shape represent the base stations within a specific zone. For the sake of illustration, zones 6, 14 and 19 are labeled in Fig. 4. Fig. 5 demonstrates the architecture of our proposed model. Essentially, the proposed model depicted in Fig. 5 breaks the city down into 20 zones and for each zone, designs a location and capacity for a data center that will be charged with handling the traffic within that zone.

In this section, we first choose to present three zones and analyse the traffic profiles in each of these zones. Fig. 6 shows the total volume of traffic that each of the 20 zones experience in the duration of time the dataset was formed. It can be observed from Fig. 6 that among these zones, zone 6 experiences the least amount of traffic while zone 19 experiences the highest volume of traffic. Zone 14 can be seen to have average amount of traffic. As such, we present the traffic profile analysis of these three zones.

Based on the aggregated hourly data obtained in the Section III A, we determine the maximum and average hourly traffic volumes of these zones during holidays and workdays. In addition, we also present the sum of the mean and one standard deviation to illustrate the amount of variation in each hour of traffic. Despite the differences in the traffic profiles of these zones, there exists some common phenomena among them. The detailed analysis of the traffic profiles of these zones are presented below.

Common Characteristics of the Zones: Figs. 7 (a), (c) and (e) represent the holiday traffic profiles of Zone 6, 14 and 19 respectively. Conversely, (b), (d) and (f) of the same figure present the workday traffic profiles of these zones. Table VI, X and XIX in [43] further provide additional information that aid the analysis of traffic profiles of these zones. For both holiday and workday traffic, the presence of the ‘tidal effect’ is evident. Traffic is gradually seen to decline during late night hours (starting from around 8 pm) and the minimum is reached around 4 am. The traffic consumption is seen to gradually increase from early morning hours (5 am to 8 am) in both types of days. The physical quantity of the standard deviation observed in these traffic volumes also increases as the traffic volume increases. Another important quantity of interest is the probability with which the traffic volumes of past days tend to fall within the sum of the mean and deviations. It is observed that, in the cases of both holidays and workdays, the hourly traffic of past days mostly fall within the sum of the mean and one standard deviation with a probability of 75 percent and more. However, this probability increases to the range of 90 to 100 percent when the sum of the mean with two standard deviations is considered. This probability analysis enables us to understand the hourly variation in traffic volume which would further aid in the subsequent design of the dimension of the data centers. The inherent traffic characteristics of each of the three zones are explained below.

Zone 6: This zone experiences the lowest amount of traffic among all others. The zone is located in the outskirts of the city and is sparsely populated. Fig. 4 shows the geographical locations of the base stations that are within this zone. There are a total of 47 TIM base stations (BSs) within this area. This relatively low number of BSs highlights the low level of traffic this zone experiences for both holidays and workdays as seen from Figs. 7 (a) and 7 (b). Some key features associated with the traffic volumes in this zone are presented in Table VI in [43].

Zone 14: This zone experiences medium amount of traffic and covers areas that are somewhat in the center of the city. There are 166 BSs in this zone which is significantly more when compared to Zone 6. This is expected given the larger volume of traffic that is experienced in these areas over time. Figs. 7 (c) and (d) also present some traffic information with Fig. 4 presenting the geographical locations of the BSs within zone 14 . Table XIV in [43] presents some of the attributes noticed for the traffic in zone 14.

Zone 19: This zone experiences the highest volumes of traffic in comparison to others. This zone, as seen from Fig. 4, is in the heart of Milan and experiences heavy traffic volumes. Services and financial companies form this predominantly commercial zone. To meet up with the traffic demands in these areas, there are 309 TIM base stations located in this area. Figs. 7 (e) and (f) present some key statistics regarding the traffic experienced on holiday and workday in this zone. Fig. 4 also illustrate the locations of these BSs within zone 19 with Table XIX in [43] presenting some key features of this zone’s traffic volumes.

In this section, we use the locations of the base stations to first heuristically determine the ideal location for a data center in each of the considered zones. We then proceed to determine the dimension of the data center to meet up with the traffic demand from the zones.

One consideration while determining the position of a data center is its distance from the base stations. Minimizing the aggregate distances between the data centers and the base stations would reduce the cost of front haul links and also would lower the delay in propagation. The problem of determining the ideal location for such a facility can be identified as the Weber’s problem which also is a special case of the Facility Location Problem [44].

The aim of any facility location problem is to determine the most suitable place to establish one or multiple facilities in the presence of many candidate locations. The facilities are usually required to provide services to meet demands that are imposed by their customers (whose locations are known). In our case, the facilities are the data centers which provide the base stations (the customers) with computational power to process the traffic experienced in each base station. The Weber problem looks to reach a point that ensures that the weighted sum from the point to the known base stations’ locations reaches its minimum [45]. Since we are considering the distances of the base stations from the data center as the parameter to minimize, the following mathematical model can be employed to represent it:

here [TeX:] $$w^{i}$$ denotes the weights assigned to the [TeX:] $$i^{th}$$ base station, among m base stations, belonging to the base station cluster [TeX:] $$C$$. The cost function [TeX:] $$$$ in this optimization framework aims to minimize the overall distance between the base stations in a particular cluster [TeX:] $$C$$ from a candidate location for data center [TeX:] $$Y$$. Some base stations within a zone often experience significantly higher volumes of traffic than others. Therefore, it is logical to place greater weights on the base stations that experience heavier load and place the data center closer to these base stations. As we have 20 zones with each having a certain number, [TeX:] $$q$$, base stations in it, we can then define a set [TeX:] $$A_{z}$$ that contains the volume of traffic of each base station within that zone [TeX:] $$z$$. Az therefore can be represented as:

where [TeX:] $$v_{iz}$$ is the volume of traffic of the [TeX:] $$i^{th}$$ base station in zone [TeX:] $$z$$. To determine the weight, [TeX:] $$w_{iz}$$, of the [TeX:] $$i^{th}$$ base station within zone [TeX:] $$z$$, we use the following equation:

The values of [TeX:] $$max(A_{z})$$ and [TeX:] $$min(A_{z})$$ are highest and lowest volume of the traffic that the most and least loaded base station within the zone [TeX:] $$z$$ experiences respectively. Knowing the weight of each base station in a zone, we can utilize Weiszfeld procedure [46] to determine the data center location in each zone. We use this algorithm due to its proven efficiency as well as low computational complexity [47]. Weiszfeld algorithm is based on the gradient descent algorithm which essentially minimizes the sum of the weighted [TeX:] $$l_{2}$$ norm of each element of the base station group [TeX:] $$C_{i}$$ and iterates to obtain the best possible location for the establishment of the data center.

As we consider three zones, each with its own boundaries in terms of latitude and longitude, the possible location for a data center in [TeX:] $$z^{th}$$ zone has to be contained in a 2-dimensional vector space, [TeX:] $$J$$, which also includes the location of all the BSs within that zone . The algorithm begins at a random coordinate point having latitude [TeX:] $$(x_{1z})$$ and longitude [TeX:] $$(y_{1z})$$ and attempts to locate the optimal point within the set J to minimize the sum of Euclidean distances from the BSs within that zone. The [TeX:] $$x$$ and [TeX:] $$y$$ values are calculated using the formula:

where [TeX:] $$x_{jz}$$ and [TeX:] $$y_{jz}$$ represent the [TeX:] $$j^{th}$$ candidate’s location for the data centre in zone [TeX:] $$z$$ with [TeX:] $$d_{oj}$$ representing the distance between the candidate location for the data center to a point in set [TeX:] $$J$$. [TeX:] $$w_{jz}$$ represents the weight assigned to the [TeX:] $$j^{th}$$ base station in zone [TeX:] $$z$$. The iterations are continued until either a convergence is reached or if the maximum number of evaluations is completed.

With the aid of this algorithm, we determine the ideal location for each data center in Zones 6, 14 and 19. The location of the data centers among the base stations are illustrated in rectangular boxes in Fig. 8.

Once the data center is established, it becomes critically important to determine its dimension. This largely depends upon the traffic demand that the data center is expected to cater for. For the purpose of this work, as we only have the information of the single mobile operator (TIM), we design the size of the data centers based on the traffic volumes experienced by its BSs in the zones under consideration. In a real life design case, it is expected that the infrastructure providers would lease their services to multiple operators and as such would require relevant information from other operators as well. Also, we only focus on the resources in terms of the computational power required to process the traffic in these zones.

The computational power is provided by servers within a data center. An area with large number of BSs would require higher amount of computational resources (CPU cores provided by the servers) to serve the traffic demands as well as to host various VNFs such as SDN controllers and virtual gateways. We assume that the VNFs for a particular zone are all hosted in a single centralized data center rather than being distributed all over. This approach requires less number of servers, and subsequently cores, as opposed to having a distributed VNF architecture [15]. VNFs that are intended to serve both data and control plane functionalities require more computational power than SDN controllers that deal with only control plane functionalities. The authors in [48] demonstrated that 20 cores of CPU processing power are required to handle 1 unit of data traffic demand (i.e. 1 Gb[TeX:] $$P_{s}$$) and only 6 cores are required by the SDN controllers. Therefore, their work shows that a total of 26 cores are needed to process 1Gb[TeX:] $$P_{s}$$ of traffic load and overhead. As such, in our model, we adopt this specification from [48] to design the dimensions of each zone’s data center based on the traffic profile analysis conducted previously.

While determining the processing power required by a data center (data center’s capacity), we need to carefully evaluate the traffic profiles that the base stations under its coverage experiences. As we have only 62 days worth of data, certain traffic characteristics might not have been captured within this time frame. Allocating resources to meet just the maximum of peakhour traffic would lead to over provisioning of resources that would remain underutilized most of the times. Similarly, having enough servers to meet only the average demand would lead to shortage of resources during peak demand hours thereby resulting in poor quality of services (QoS). Referring to Tables in [43], we can see that a good metric to determine the volume of traffic that a data center needs to be designed for can be based on the sum of the average and standard deviations of the traffic volume. The hourly traffic volume of zones had surpassed the sum of the mean and one standard deviation in considerable number of occasions. However, most of these volumes fell well within the sum of the mean and two standard deviations. The ideal traffic volume that a data center need to cater for, therefore, lies somewhere in the range between these two. We therefore heuristically determine that the ideal design capacity, [TeX:] $$V_{z}$$, of the data center in zone [TeX:] $$z$$ to be:

[TeX:] $$f(std)_{z}$$ and [TeX:] $$g(mean)_{z}$$ are the corresponding values of the standard deviation and mean for the maximum hourly sum of mean and one standard deviation for a particular zone [TeX:] $$z$$. [TeX:] $$D_{z} \in[1,2]$$ is the multiplier which aids in determining the maximum traffic volume, [TeX:] $$V_{z}$$, that the data center in the [TeX:] $$z^{th}$$ zone would be capable of serving at any given time. This multiplier is inversely proportional to the probability of the [TeX:] $$h^{th}$$ hour’s traffic volume to fall within the sum of the mean and one standard deviation in zone [TeX:] $$z$$, [TeX:] $$P_{h}(mean+1std)_{z}$$. In this work, the [TeX:] $$D_{z}$$ value is 1.6 when the [TeX:] $$P_{h}(mean_1std)_{z}$$ is 0.6 and Dz is 1 when [TeX:] $$P_{h}(mean_1std)_{z}$$ is 1. Then, [TeX:] $$D_{z}$$ can be obtained using the following heuristically obtained mathematical relationship:

where [TeX:] $$\alpha$$ for this dataset is −0.6 and [TeX:] $$P_{s}$$ is 0.6 as mentioned above. [TeX:] $$max(P_{h}(mean+1std))_{z}$$ denotes the probability value for the hour that demonstrates the highest sum of mean with one standard deviation [TeX:] $$(mean+1std)$$ of traffic volume in that zone. For example, in zone 6, we can see from Table VI in [43] that the highest value of [TeX:] $$mean+1std$$ is observed for the hour 8 (between 8 am and 9 am) holiday traffic which corresponds to a [TeX:] $$P_{8}(mean+1std)_{6}$$ value of 0.95. Therefore, to determine the deviation factor in this zone, we use this value as our [TeX:] $$max(P_{h}(mean+1std))_{6}$$. With this, we can then determine the traffic volume that the data center in zone [TeX:] $$z$$ needs to be designed for using equation 8.

As mentioned previously, based on the specification in [48], to process one unit of traffic i.e 1 Gb[TeX:] $$P_{s}$$, 26 cores of processing power is required. Given the aggregated nature of the data set we have, it is not possible to evaluate the demand volume that is experienced per second for each zone. Therefore, in this work, we assume that at every second, same amount of demand is generated resulting in a cumulative volume of [TeX:] $$V_{z}$$ in the [TeX:] $$z^{th}$$ zone per hour. Note also that the capacity of the front haul links between the base stations and the data center also play a crucial role in the processing of the traffic demand. This, however, is beyond the scope of this work. Using the above equations and the specifications, we can proceed to determine the capacity of each data center.

Zone 6 Data Center: This light traffic volume zone, as mentioned above, possesses [TeX:] $$max(P_{8}(mean+1std))_{6}$$ value of 0.95 based on the maximum of sum and 1 std that is noticed at the [TeX:] $$8^{th}$$ hour. Using equation 9, we obtain the multiplier of zone 6, [TeX:] $$D_{6}$$, to be 1.07. The corresponding mean and standard deviation values of this hour’s of traffic are 27,350.04 MB and 7,237.04 MB respectively. Therefore, using equation 8, we obtain the value of 37,181 MB (290 Gb) as the maximum volume of traffic that this data center would be required to handle at any given hour. Note that this value is greater than any of the peak hourly traffic for both working days and holidays based on the available data. This value is also much smaller than the sum of mean and two standard deviation value. As such, it is a value with ample tolerance to meet the highest traffic demand that might be encountered in this zone. Using our assumptions and specifications, we evaluate that the data center for this zone would require maximum of 2 cores to process the traffic volume for the TIM subscribers in this zone.

Zone 14 Data Center: Zone 14’s medium level traffic has a [TeX:] $$max(P_{h}(mean+1std))_{14}$$ value of 0.8 corresponding to the 18th hour of the workday traffic. The deviation factor of this zone, [TeX:] $$D_{14}$$ is evaluated to be 1.30 using equation 9. With corresponding values of mean and standard deviation of 133,907.90 MB and 34,688.45 MB respectively, we determine the maximum volume of traffic that the data center of this zone would have to handle at any given hour to be 219,175 MB (1,712Gb). To fulfill this volume of traffic demand, the servers in the data center in this zone need to have 12 cores of CPU power.

Zone 19 Data Center: Traffic level of this area surpasses others and possess a [TeX:] $$max(P_{h}(mean+1std))_{19}$$ value and deviation factor, [TeX:] $$D_{19}$$, of 1 for the [TeX:] $$13^{th}$$ hour of the holiday traffic. The mean and standard deviation value corresponding to this hour are 746,016.40 MB and 225,068.60 MB respectively. As expected, the maximum volume of traffic that the data center in this zone needs to cater for, 971,085 MB (7,586 Gb) is also the highest among all others. To satisfy this level of demand, the capacity of this data center would also have to greater than others. This zone’s data center would require 55 cores to process the traffic demands from the subscribers.

In this section, we devote to employ several state-of-the-art recurrent neural network (RNN) models to forecast next day’s traffic of each zone based on previously collected data. The idea is to have the future traffic demand of these areas in hand to facilitate the operation of these data centers. Accurate prediction models will aid operation of the data centers and can lower the operational cost of the infrastructure providers as unused capacity of these data centers can be put on sleep mode, resulting in reduced energy consumption.

RNN has proved to be an effective tool to perform prediction on time-series data. Given that we have a inherently seasonal data of hourly aggregated traffic demand of different zones, RNN models can be used to make forecasts of future demand. We use two RNN models: long short term memory (LSTM) and gated recurrent unit (GRU). We also test the fitness of two activation functions : rectified linear unit (ReLU) and hyperbolic tangent (tanh) to determine the combination that produces the result with highest accuracy. As the holiday demand is different from the workday one, we tested these models on each type of day for the considered zones. Below we briefly explain the concepts of RNN and its models that have been utilized for this section of the work.

RNN’s main idea is to capture and store relevant amount of information from the input in a memory to use it while making a future prediction for the output. This is a fundamental difference between RNN and traditional feed forward neural networks that simply make use of only the present input to produce an output. RNNs are termed as recurrent as they perform the same operation on every element of a sequence whereby the output of the present step is heavily impacted by that of previous ste[TeX:] $$P_{s}$$. Fig. 9 illustrates a typical RNN model and its ability to include previous input with present one to predict the future output.

RNN takes in the input [TeX:] $$i$$, captures the hidden state [TeX:] $$a$$ and produces an output of [TeX:] $$o$$ at every time step [TeX:] $$t$$. The information from one step [TeX:] $$t$$o the following is carried on by a loop. The [TeX:] $$W$$’s stand for various weight matrices during the time ste[TeX:] $$P_{s}$$. These matrices are changed during the training phase as the network is ‘unrolled’ for a certain number of time ste[TeX:] $$P_{s}$$. As shown in Fig. 9, this unrolling of network in time ste[TeX:] $$P_{s}$$ allow the RNN to learn information present in sequential data. The computation that takes place in every time step can be summarised as follows:

information present in sequential data. The computation that takes place in every time step can be summarised as follows:

1. [TeX:] $$i_{t}$$ serves as the input in time step [TeX:] $$t$$.

2. The hidden state [TeX:] $$a_{t}$$ at time step [TeX:] $$t$$ is calculated based on the previous hidden step and the present input. These two pieces of information are combined through the use of activation functions such as ReLU and tanh.

3. The output step at time step [TeX:] $$t$$ is termed as [TeX:] $$o_{t}$$.

With different inputs [TeX:] $$i_{t}$$ in different time steps same computations are performed with unrolled parameters [TeX:] $$W_{ia}$$,[TeX:] $$W_{aa}$$ and [TeX:] $$W_{ao}$$. This attribute of the RNNs makes them extremely useful for smaller data set by avoiding over fitting. Two common RNN models in use now are the LSTM and GRU. We provide brief explanation of the working principles of these models along with the activation functions.

LSTM: The hidden state in traditional RNN does not provide enough control over how much of the past information should be kept and this leads to problems such as vanishing and exploding gradients [49]. To overcome such problems, LSTM models were designed to have two additional gates termed as the input and forget gates. The gating mechanism allows LSTMs to adequately model long-term dependencies present in complex non linear data. LSTM essentially learns the optimal parameters for its gates during the training phase, thereby determining the behavior of its memory. Interested readers are addressed to read [50] for more details on LSTM.

GRU: Due to the presence of both input and forget gates, the LSTM model often becomes computationally expensive. GRU, a more recent edition of the RNN models, presents a simpler architecture where the input and forget gates are combined into a update gate. The basic idea of capturing and learning long term dependencies on time series data is however maintained in GRU as well. Detailed explanation regarding the GRU model can be found in [51].

Activation functions: the activation functions play an important role in RNN and its models’ ability to accurately make future predictions. The two activation functions we have used in this work are ReLU and tanh. Fig. 10 (a) shows the ReLU activation function and Fig. 10 (b) demonstrates the tanh activation function.

We test the fitnesses of the LSTM and GRU models on the zonal data to predict future demands. We employ these models with relu and tanh activation functions. We follow a 70 : 30 train-test split convention i.e the first 70 percent of both holiday and workday data is used for training the neural network and the rest is used for testing. In addition, we use average hourly traffic as a baseline for comparison purposes. We use Google’s open source machine learning platform Tensorflow on a 2.6 GHz, 4 cores and NVDIA GTX 970 graphics card enabled computer to analyse the performances of these algorithms on holiday and workday data. The neural network was designed with 2 hidden layers with each having 50 neurons. The input and the output dimensions are both 1 × 1 The obtained results are explained below:

Holiday Prediction: Figs. 11 (a), (c) and (e) demonstrate the performances of the considered algorithms on holiday data of zone 6, 14 and 19 for a three-day period respectively. Note that holidays consist of less amount of data points compared to workdays (528 data points for holidays compared to 960 data points in workdays). LSTM and GRU models generally perform well with accuracy of 90 percent and more across all zones with both activation functions. The average hourly traffic, however, is clearly seen to be incapable of capturing the traffic trend of these three day period. Figs. 12 (a) and (c) show the average of root mean square errors (RMSE) and symmetric mean absolute percentile error (SMAPE) of these algorithms on holiday data set. It is also observed different model emerges as the best when predictions are made across different zones. Therefore, it can be concluded that no single prediction model can be used to obtain accurate forecast of future traffic across all zones.

Workday Prediction: The performances of the considered machine learning algorithms on a slightly larger workday dataset are presented in Figs. 11 (b), (d) and (f). Once more, the GRU and LSTM algorithms performed similarly to each other and were able to forecast traffic with great accuracy. The average once again can be seen to be insufficient for this purpose. Figs. 12 (b) and (d) shows the RMSE and SMAPE values of each algorithm on workday data set. The GRU and LSTM models predict with least error while maintaining accuracy of greater than 95 percent on this data set. Similar to holiday prediction, the prediction model that makes the best prediction varies in different zones.

Figs. 13 (a) and (b) demonstrates the time it takes for each of these algorithms to complete training and make prediction for both holiday and workday data respectively. As expected, due to the presence of additional gate in the LSTM architecture, it takes slightly longer runtime when compared to a simpler GRU architecture.

With the aid of machine learning algorithms, it would be possible for infrastructure providers to determine the hourly demand that will be encountered from a particular MNO within a zone. In the absence of an accurate traffic forecasting mechanism, allocation of data center resources would be reactive i.e resources would be allocated once the demand arises. This could often lead to congestion and degradation of QoS as it is difficult to allocate proper amount of resources if the allocation is based on reaction. Furthermore, from the data center’s point of view, knowing future demand values can aid utilization of its resources. During hours when relatively low volume of traffic is predicted in a certain zone, its data center can effectively keep the amount of resources needed to cater for that predicted volume of traffic operational, with rest being

aggressively put on idle/sleep mode [52],[53]. This eliminates the need to constantly keep the data center resources active during all hours. By keeping additional resources inactive will lower the energy consumed by the data center and significantly reduce operational expenses for the infrastructure provider.

In this paper, we analysed the open Big data set of Telecom Italia to determine the traffic profiles that exist in different zones within the city of Milan. We processed the data set to have a hourly cellular traffic demand that arises in different parts of the city during the course of the day. Using K-mean clustering algorithm, we split the city of Milan into 20 zones and from that isolated three zones (Zone 6, Zone 14, and Zone 19) that demonstrate the least, medium and most volume of traffic respectively. Based on the location and traffic handled by each base station in a zone, we proposed the establishment of a data center to host the VNFs and SDN controllers in each zone. We identified the problem of the placement of data center as a facility location problem which was solved using Weiszfeld’s algorithm. Furthermore, based on the traffic profile of each zone, we heuristically determined the ideal dimension of a data center that will be capable of handling the traffic within that zone. Finally, we used machine learning algorithms to predict the future demand to enhance the operation of the data center in each of the considered zones. Results showed the ability of the LSTM and GRU models to predict future demand values with considerably high accuracy.

Udita Paul received his M.Sc in Electrical Engineering at the University of Cape Town in 2018. He is currently pursuing Ph.D. degree in Computer Science at the University of California, Santa Barbara with the MOMENT lab. His primary research interests include 5G, network slicing, network function virtualization, edge computing and machine learning algorithms.

Sebastian Troia received the B.Sc. and M.Sc. degrees in telecommunications engineering from the Politecnico di Milano in 2013 and 2016, respectively, where he is currently pursuing a Ph.D. degree in information technology with the BONSAI LAB. His research interests are related to machine learning algorithms for communications networks, software-defined networking, network orchestration automation-optimization, SD-WAN, and optical multipath routing.

- 1
*Cisco Visual Networking Index (VNI) Update Global Mobile Data Traffic Forecast 2016-2021*, Vni, 2017.custom:[[[-]]] - 2
*Next generation mobile networks alliance. (Feb. 2015). NGMN 5G Initiative White Paper. (Online). Available:*, https://www.ngmn.org/uploads/media/NGMN-5G-White-Paper-V1-0.pdf - 3
*Network functions virtualisation (NFV); Management and orchestration, ETSI, Sophia Antipolis, France, 2014. (Online). Available:*, https://www.etsi.org/deliver/etsi-gs/NFV-MAN/001-099/001/01.01.0160/gs-NFV-MAN001v010101p.pdf - 4
*SDN architecture, Open Netw. Found., Palo Alto, CA, USA, 2014. (Online). Available:*, https://www.opennetworking.org/images/stories/downloads/sdn-resources/technical-reports/TR-SDN-ARCH1.0-06062014.pdf - 5 B. Martini, F. Paganelli, P. Cappanera, S. Turchi, P. Castoldi, "Latencyaware composition of virtual functions in 5G,"
*in Proc. NetSoft*, pp. 1-6, Apr, 2015.custom:[[[-]]] - 6 R. Pal, S. Lin, L. Golubchik, "The cloudlet bazaar dynamic markets for the small cloud,"
*ArXiv preprint arXiv: 1704.00845*, 2017.custom:[[[-]]] - 7] M. Satyanarayanan, "The emergence of edge computing," Computer, vol. 50, no. 1, pp. 30-39, Jan. 2017. [8] 5G network architecture: A high-level perspective, Huaweii, 2016. [Online The emergence of edge computing," Computer, vol. 50, no. 1, pp. 30-39, Jan. 2017. [8] 5G network architecture: A high-level perspective, Huaweii, 2016. [Online-sciedit-2-03"> Satyanarayanan , M. ( 2017 , Jan ). The emergence of edge computing .
*Computer. . [8*,( 1 ), 30 - 39 , doi:[[[10.1109/MC.2017.9]]].*50* - 9 W. Rankothge, F. Le, A. Russo, J. Lobo, "Optimizing resource allocaFig. 12. RMSE and SMAPE of the algorithms in predicting Holiday and Workday traffic. Fig. 13. Runtime of the algorithms to predict Holiday and Workday traffic tion for virtualized network functions in a cloud center using genetic algorithms,"
*IEEE Trans. Netw. Service Manag.*, vol. 14, no. 2, pp. 343-356, June, 2017.custom:[[[-]]] - 10 R. Mijumbi et al., "Topology-aware prediction of virtual network function resource requirements,"
*IEEE Trans. Netw. Service Manag.*, vol. 14, no. 1, pp. 106-120, Mar, 2017.doi:[[[10.1109/TNSM.2017.2666781]]] - 11 K. Suksomboon, M. Fukushima, M. Hayashi, R. Chawuthai, H. Takeda, "LawNFO: A decision framework for optimal location aware network function outsourcing,"
*in Proc. NetSoft*, pp. 1—9-1—9, June, 2015.custom:[[[-]]] - 12 A. Laghrissi, T. Taleb, M. Bagaa, H. Flinck, "Towards edge slicing: VNF placement algorithms for a dynamic realistic edge cloud environment,"
*in Proc. IEEE GLOBECOM*, pp. 1-6, Dec, 2017.custom:[[[-]]] - 13 R. Tripathi, S. Vignesh, V. Tamarapalli, D. Medhi, "Cost efficient design of fault tolerant geo-distributed data centers,"
*IEEE Trans Netw. Service Manag.*, vol. 14, no. 2, pp. 289-301, June, 2017.doi:[[[10.1109/TNSM.2017.2691007]]] - 14 K. Zheng, W. Zheng, L. Li, X. Wang, "PowerNetS: Coordinating data center network with servers and cooling for power optimization,"
*IEEE Trans. Netw. Service Manag.*, vol. 14, no. 3, pp. 661-675, Sept, 2017.doi:[[[10.1109/TNSM.2017.2711567]]] - 15 A. Basta et al., "Towards a cost optimal design for a 5G mobile core network based on SDN and NFV,"
*IEEE Trans. Netw. Service Manag.*, vol. 14, no. 4, pp. 1061-1075, Dec, 2017.doi:[[[10.1109/TNSM.2017.2732505]]] - 16 A. Furno, D. Naboulsi, R. Stanica, M. Fiore, "Mobile demand profiling for cellular cognitive networking,"
*IEEE Trans. Mobile Comput.*, vol. 16, no. 3, pp. 772-786, Mar, 2017.doi:[[[10.1109/TMC.2016.2563429]]] - 17 S. Wang et al., "An approach for spatial-temporal traffic modeling in mobile cellular networks,"
*in Proc. IEEE ITC*, pp. 203-209, Sept, 2015.custom:[[[-]]] - 18 S. Troia, Gao Sheng, R. Alvizu, G. A. Maier, A. Pattavina, "Identification of tidal-traffic patterns in metro-area mobile networks via Matrix Factorization based model,"
*in Proc. PerCom Workshops*, pp. 297301-297301, Mar, 2017.custom:[[[-]]] - 19 R. Li et al., "The learning and prediction of application-level traffic data in cellular networks,"
*IEEE Trans. Wireless Commun.*, vol. 16, no. 6, pp. 3899-3912, June, 2017.doi:[[[10.1109/TWC.2017.2689772]]] - 20 R. Li, Z. Zhao, X. Zhou, J. Palicot, H. Zhang, "The prediction analysis of cellular radio access network traffic: From entropy theory to networking practice,"
*in IEEE Commun. Mag.*, vol. 52, no. 6, pp. 234-240, June, 2014.doi:[[[10.1109/MCOM.2014.6829969]]] - 21 L. Cui, F. R. Yu, Q. Yan, "When big data meets software-defined networking: SDN for big data and big data for SDN,"
*in IEEE Netw*, vol. 30, no. 1, pp. 58-65, Jan, 2016.doi:[[[10.1109/MNET.2016.7389832]]] - 22 J. Dai, J. Li, "Vbr mpeg video traffic dynamic prediction based on the modeling and forecast of time series,"
*in Proc. IEEE NCM*, pp. 1752-1752, Aug, 2009.doi:[[[10.1109/NCM.2009.11]]] - 23 O. Cappe, E. Moulines, J. C. Pesquet, A. P. Petropulu, X. Yang, "Long-range dependence and heavy-tail modeling for teletraffic data,"
*IEEE Signal Process. Mag.*, vol. 19, no. 3, pp. 14—27-14—27, May, 2002.doi:[[[10.1109/79.998079]]] - 24 A. Soule et al., "Traffic matrices: Balancing measurements, inference and modeling,"
*in Proc. ACM SIGMETRICS*, pp. 1-1, June, 2005.doi:[[[10.1145/1071690.1064259]]] - 25 M. C. Falvo, M. Gastaldi, A. Nardecchia, A. Prudenzi, "Kalman filter for short-term load forecasting: An hourly predictor of municipal load,"
*in Proc. IASTED ASM*, pp. 364—369-364—369, Aug, 2007.custom:[[[-]]] - 26 F. Ashtiani, J. A. Salehi, M. R. Aref, "Mobility modeling and analytical solution for spatial traffic distribution in wireless multimedia networks,"
*IEEE J. Sel. Areas Commun.*, vol. 21, no. 10, pp. 1699-1709, Dec, 2003.doi:[[[10.1109/JSAC.2003.815680]]] - 27 K. Tutschku, P. Tran-Gia, "Spatial traffic estimation and characterization for mobile communication network design,"
*IEEE J. Sel. Areas Commun.*, vol. 16, no. 5, pp. 804-804, June, 1998.doi:[[[10.1109/49.700914]]] - 28 T. P. Oliveira, J. S. Barbar, A. S. Soares, "Computer network traffic prediction: A comparison between traditional and deep learning neural networks,"
*International J. Big Data Intelligencep. 28-37*, vol. 3, no. 1, Jan, 2016.doi:[[[10.1504/IJBDI.2016.073903]]] - 29 C. W. Huang, C. T. Chiang, Q. Li, "A study of deep learning networks on mobile traffic forecasting,"
*in Proc. IEEE PIMRC*, pp. 1-6, Oct, 2017.custom:[[[-]]] - 30 M. Barabas, G. Boanea, A. B. Rus, V. Dobrota, J. D. oPascual, "Evaluation of network traffic prediction based on neural networks with multi-task learning and multiresolution decomposition,"
*in Proc. IEEE ICCP*, pp. 95—102-95—102, Aug, 2011.doi:[[[10.1109/ICCP.2011.6047849]]] - 31 G. D’angelo, R. Pilla, J. B. Dean, S. Rampone, "Toward a soft computing-based correlation between oxygen toxicity seizures and hyperoxic hyperpnea,"
*Soft Comput*, vol. 22, no. 7, pp. 2421-2427, Apr, 2018.doi:[[[10.1007/s00500-017-2512-z]]] - 32 C. K. Dominicini et al., "VirtPhy: Fully programmable NFV orchestration architecture for edge data centers,"
*IEEE Trans. Netw. and Service Manag.*, vol. 14, no. 4, pp. 817-830, Dec, 2017.doi:[[[10.1109/TNSM.2017.2756062]]] - 33 S. Gebert et al., "Demonstrating the optimal placement of virtualized cellular network functions in case of large crowd events.,"
*in Proc. ACM SIGCOMM*, pp. 359-360, Aug, 2014.custom:[[[-]]] - 34 C. H. Liu, J. Fan, "Scalable and efficient diagnosis for 5G data center network traffic,"
*IEEE Access*, vol. 2, pp. 841-855, Aug, 2014.doi:[[[10.1109/ACCESS.2014.2349000]]] - 35 M. C. Luizelli, L. R. Bays, L. S. Buriol, M. P. Barcellos, L. P. Gaspary, "Piecing together the NFV provisioning puzzle: Efficient placement and chaining of virtual network functions,"
*in Proc. IFIP/IEEE IM*, pp. 98—106-98—106, May, 2015.doi:[[[10.1109/INM.2015.7140281]]] - 36 R. Shi et al., "MDP and machine learning-based cost-optimization of dynamic resource allocation for network function virtualization,"
*in Proc. IEEE SCC*, pp. 65-73, June, 2015.doi:[[[10.1109/SCC.2015.19]]] - 37 I. Narayanan, A. Kansal, A. Sivasubramaniam, "Right-sizing geodistributed data centers for availability and latency,"
*in Proc. IEEE ICDCS*, pp. 230-240, June, 2017.custom:[[[-]]] - 38 H. Raei, "Capacity planning framework for mobile network operator cloud using analytical performance model,"
*International J. Commun. Syst.*, vol. 30, no. 17, pp. 1-12, June, 2017.doi:[[[10.1002/dac.3353]]] - 39 M. Carvalho, D.A. Menasce, F. Brasileiro, "Capacity planning for IaaS cloud providers offering multiple service classes,"
*Future Generation Comput. Syst.*, vol. 77, pp. 97-111, Dec, 2017.doi:[[[10.1016/j.future.2017.07.019]]] - 40 L. Nie, D. Jiang, L. Guo, S. Yu, H. Song, "Traffic matrix prediction and estimation based on deep learning for data center networks,"
*in Proc. IEEE Globecom Wkshps, . pp 1-6*, Dec, 2016.doi:[[[10.1109/GLOCOMW.2016.7849067]]] - 41 Gianni Barlacchi et al., "A multi-source dataset of urban life in the city of Milan and the province of Trentino,"
*Scientific Data*, vol. 2, Oct, 2015.doi:[[[10.1038/sdata.2015.55]]] - 42 http://opencellid.org/
- 43 https://www.scribd.com/document/383828390/Traffic-Chracteristics-ofDifferent-Zones-in-Milan
- 44 P. V. Heiningen, E. Reehuis, T. Bäck, "Comparing a Weiszfeld’sbased procedure and (1+1)-es for solving the planar single-facility locationrouting problem,"
*in Proc. IEEE SSCI*, pp. 1743-1750, Dec, 2015.custom:[[[-]]] - 45 R. Z. Farahani, M. Hekmatfar,
*Facility Location Concepts*, Models Algorithms and Case Studies. SpringerVerlag Berlin Heidelberg, 2009.custom:[[[-]]] - 46 E. Weiszfeld, "Sur le point sur lequel la somme des distances de n points donnes est minimum,"
*Tohoku Mathematical J.*, vol. 43 no.1, no. vol.43 1, pp. 335386-335386, 1937.custom:[[[-]]] - 47 K. Aftab, R. Hartley, J. Trumpf, "Generalized Weiszfeld algorithms for Lq optimization,"
*IEEE Trans. Pattern Anal. Mach. Intell.*, vol. 37, no. 4, pp. 728-745, Apr, 2015.doi:[[[10.1109/TPAMI.2014.2353625]]] - 48 A. Basta, W. Kellerer, M. Hoffmann, H. J. Morper, K. Hoffmann, "Applying NFV and SDN to LTE mobile core gateways, the functions placement problem,"
*in Proc. ACM SIGCOMM*, pp. 33—38-33—38, Aug, 2014.doi:[[[10.1145/2627585.2627592]]] - 49 R. Pascanu, T. Mikolov, Y. Bengio, "On the difficulty of training recurrent neural networks,"
*in Proc. ICML*, pp. 1310-1318, June, 2013.custom:[[[-]]] - 50 Hochreiter, Sepp, Schmidhuber, Jürgen, "Long Short-Term Memory,"
*Neural Comput*, vol. 9, no. 8, pp. 1735-1780, Nov, 1997.doi:[[[10.1162/neco.1997.9.8.1735]]] - 51 J. Chung, C. Gulcehre, K. Cho, Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling,"
*. in Proc. NIPS*, Dec, 2014.custom:[[[-]]] - 52 C. Gu, Z. Li, H. Huang, X. Jia, "Energy efficient scheduling of servers with multi-sleep modes for cloud data center,"
*in EEE Trans. Cloud Comput. (Early Access)doi: 10.1109/TCC..2834376*, 2018.doi:[[[10.1109/TCC.2018.2834376]]] - 53 L. Fan, C. Gu, L. Qiao, W. Wu, H. Huang, "GreenSleep: A multi-sleep modes based scheduling of servers for cloud data center,"
*in Proc. IEEE BIGCOM*, pp. 368-375, Aug, 2017.doi:[[[10.1109/BIGCOM.2017.16]]]