Data Collection


Table I - List of cars used for data collection and number of instances recorded for each event per vehicle

Sr. No.
Car Model
Cat eyes
Manholes
Potholes
Speed Bumps
1
Suzuki Baleno
54
55
79
68
2
Alto
50
71
80
63
3
Aqua
57
53
82
65
5
City
51
56
56
49
6
Corolla
50
57
74
54
7
Cultus
52
61
74
54
8
GLI
53
61
59
84
9
Margalla
21
31
36
59
10
Mehran
54
60
59
69
11
Santro
47
53
41
42
Figure 1. Data collection process: The data logger was mounted on two locations (shown as red circles) 1) the dashboard 2) above the tyres. The four windows on the right show accelerometer signals along transversal (blue), longitudinal (red) and vertical (black) directions for each of the four anomalies: Cat eye (blue), Manhole (green), Pothole (maroon) and Speed Bump (magenta)


The purpose of our work is to learn discriminative patterns in a vehicle’s acceleration over a continuous drive, emulating the real-world scenario as closely as possible. Therefore, the data set collected was highly skewed towards regular road instances, sparsely interspersed with anomalous events such as Cat eyes, Manholes, Potholes and Speed Bumps. Below we describe how we collected and labelled the data, in detail:

A. Hardware

In order to collect the data, a customized data logger, 4 x 5 cm in size, was designed which is an improved version of our team’s previous work [1]. PIC18F26K22 was the brain of the device to which 3-axis accelerometer ADXL362 and VK2828U7G5LF GPS sensors were attached. The device has on-board storage in the form of an SD card and is capable of storing 3-axis accelerometer readings at 100 10% Hz (due to the inaccuracy of internal clock of ADXL362). We used an average sampling rate of 93 Hz in our wok. The GPS reported readings at 1 Hz. Data logger is powered with a 3.1 Ah battery with active lifetime of 36 hours.

B. Data Acquisition

For data collection, a MATLAB based GUI was developed with push buttons for each of the four anomalies – Cat eyes, Manholes, Potholes and Speed Bumps. Eleven cars were driven over a distance of 22km over the course of two months with each continuous drive ranging from 30 to 45 minutes long. Fig. 2 shows the route followed by Baleno, along with various road anomalies that were traversed along it. Table I summarizes the list of cars used and the number of events recorded for each RSD per vehicle. The data loggers were mounted on two locations 1) on the dashboard, inside the vehicle and 2) near the car tyres, outside the vehicle, as shown in Fig. 1. The y, x and z axes were aligned with the longitudinal, transversal and vertical direction respectively. Each driver was accompanied by a copilot who used the MATLAB GUI and pressed the corresponding button each time a certain anomaly was traversed by the vehicle. During each drive, two files were generated for each of the two locations of the data logger — Top and Bottom. The first file contained raw accelerometer readings stamped with GPS markers (start time, latitude, longitude, speed), whereas the second file contained the time at which each button was pressed and the corresponding button label. It was ensured that during each trip, around fifty events of each type of RSD were recorded. The instances where no button key was pressed, corresponded to regular road. Fig. I describes the entire data collection process.

C. Data Annotation

Based on the two data files generated above, preliminary la-bels were assigned to each of the 1s non-overlapping windows (93 samples in each window) of the time series accelerometer data programmatically, by comparing the time reported by GPS and the time recorded by the GUI on button press.

Afterwards, the data was loaded into another GUI where an assigned person corrected the mislabeled windows, due to human error such as delay while recording the event, by observing the time series acceleration waveforms.

D. Data Cleaning and Pre-processing

Data collected in real-time applications such as RSD detection can have several discrepancies due to human error, restrictions imposed by the hardware and shortcomings in the data collection process. Below we briefly discuss how we tried to overcome this problem and improved the quality of the data collected, followed by various steps involved in data pre-processing.

1) Missing Values: ADXL362 sensor has a data rate of 100Hz with 10% variation. As a result, the number of samples recorded per second varied from 91-97 Hz with more than ninety percent of the sample windows having a sample rate around 93-94 Hz. In order to keep the sampling rate constant, we selected an average value equal to 93Hz and linearly interpolated in case the number of samples recorded were not equal to the selected data rate.

2) Windowing: The entire time series of accelerations was divided into 1s (93 samples) non-overlapping windows and each window was labelled as Anomaly (positive) or Normal (negative) for binary classification.

3) Feature Subset Selection: The feature set considered for sequential selection included three sets of five features i.e. Group A - Fast Fourier Transform (FFT), Group B - Discrete Wavelet Transform (DWT) and Group C - Peak along each of the five dimensions: X, Y, Z, XY, XYZ.

[1] Mazhar, Suleman, et al. "Design of a memory-card based low-cost GPS data-logger for livestock monitoring." SENSORS, 2015 IEEE. IEEE, 2015

No comments:

Post a Comment