• Special Issue on Big Medical/Healthcare Data Analytics – Call for Papers

    The last decade has seen huge advances in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to use technology to analyze and understand it. We have been witnessing a digital revolution associated with developments of various emerging technologies including ubiquitous...

  • Special Issue on Big Data and Smart Cities – Call for Papers

    Introduction A smart city integrates information and communication technologies, as well as Internet of Things (IoT) solutions to reduce costs and resource consumption, enhance performance, and connect and engage more effectively and actively with its citizens. This vast and semi-structured collection of city and citizen-related data provides many opportunities for the development of smart...

  • Special Issue on Hybrid Evolutionary and Swarm Techniques for Big Data Analytics and Applications – Call for Papers

    GUEST EDITORS: Kevin Kam Fung Yuen, School of Business, Singapore University of Social Sciences, Singapore (email: kfyuen@suss.edu.sg , kevinkf.yuen@gmail.com ) Steven Sheng-Uei Guan, Research Institute of Big Data Analytics, Xi’an Jiaotong-Liverpool University, China (email: Steven.Guan@xjtlu.edu.cn ) Richard Everson, Department of Computer Science, Exeter University, United Kingdom (email: R.M.Everson@exeter.ac.uk ) Kit Yan Chan,...

  • Virtual Special Issue Big Data@Elsevier.Computer Science – Virtual Special Issue

    Big Data@Elsevier.Computer Science To celebrate the IEEE Big Data Conference in Washington on 5-8 December 2016, Elsevier Computer Science presents a virtual special issue on some of the most cited articles on Big Data across all our Computer Science Journals. This virtual special issue highlights papers published between 2014 and 2015 that...

  • Black-box Confidence Intervals: Excel and Perl Implementation

    Originally posted here. Check original article for most recent updates. Confidence interval is abbreviated as CI. In this new article (part of our series on robust techniques for automated data science) we describe an implementation both in Excel and Perl, and discussion about our popular model-free confidence interval technique introduced in our original...

  • Correlation and R-Squared for Big Data

    Originally posted on Analyticbridge, by Dr. Granville. Click here to read original article and comments. With big data, one sometimes has to compute correlations involving thousands of buckets of paired observations or time series. For instance a data bucket corresponds to a node in a decision tree, a customer segment, or a subset of observations...

  • How to detect spurious correlations, and how to find the real ones

    Originally posted on DataSciebceCentral, by Dr. Granville. Click here to read original article and comments. Specifically designed in the context of big data in our research lab, the new and simple strong correlation synthetic metric proposed in this article should be used, whenever you want to check if there is a real association between two variables, especially...

  • Practical illustration of Map-Reduce (Hadoop-style), on real data

    Originally posted on DataScienceCentral, by Dr. Granville. Click here to read original article and comments. Here I will discuss a general framework to process web traffic data. The concept of Map-Reduce will be naturally introduced. Let’s say you want to design a system to score Internet clicks, to measure the chance for a click to...

  • Jackknife logistic and linear regression for clustering and predictions

    Originally posted on DataSciebceCentral, by Dr. Granville. Click here to read original article and comments. This article discusses a far more general version of the technique described in our article The best kept secret about regression. Here we adapt our methodology so that it applies to data sets with a more complex structure, in particular with...

  • A synthetic variance designed for Hadoop and big data

    Originally posted on Hadoop36o, by Dr. Granville. Click here to read original article and comments. The new variance introduced in this article fixes two big data problems associated with the traditional variance and the way it is computed in Hadoop, using a numerically unstable formula. Synthetic Metrics This new metric is synthetic: It was not derived naturally from...