Quantcast
Channel: 懒得折腾
Viewing all 764 articles
Browse latest View live

How to build a robot that “sees” with $100 and TensorFlow

$
0
0

How to build a robot that “sees” with $100 and TensorFlow

Adventures in deep learning, cheap hardware, and object recognition.

September 21, 2016

Eye of Providence.
Eye of Providence.(source: Bureau of Engraving and Printing on Wikimedia Commons).

Object recognition is one of the most exciting areas in machine learning right now. Computers have been able to recognize objects like faces or cats reliably for quite a while, but recognizing arbitrary objects within a larger image has been the Holy Grail of artificial intelligence. Maybe the real surprise is that human brains recognize objects so well. We effortlessly convert photons bouncing off objects at slightly different frequencies into a spectacularly rich set of information about the world around us. Machine learning still struggles with these simple tasks, but in the past few years, it’s gotten much better.

Deep learning and a large public training data set called ImageNet has made an impressive amount of progress toward object recognition.TensorFlow is a well-known framework that makes it very easy to implement deep learning algorithms on a variety of architectures. TensorFlow is especially good at taking advantage of GPUs, which in turn are also very good at running deep learning algorithms.

Building my robot

I wanted to build a robot that could recognize objects. Years of experience building computer programs and doing test-driven development have turned me into a menace working on physical projects. In the real world, testing your buggy device can burn down your house, or at least fry your motor and force you to wait a couple of days for replacement parts to arrive.

Architecture of the object-recognizing robot
Figure 1. Architecture of the object-recognizing robot. Image courtesy of Lukas Biewald.

The new third generation Raspberry Pi is perfect for this kind of project. It costs $36 on Amazon.com and has WiFi, a quad core CPU, and a gigabyte of RAM. A $6 microSD card can load Raspberian, which is basically Debian. See Figure 1 for an overview of how all the components worked together, and see Figure 2 for a photo of the Pi.

Raspberry Pi
Figure 2. Raspberry Pi running in my garage. Image courtesy of Lukas Biewald.

I love the cheap robot chassis that Sain Smart makes for around $11. The chassis turns by spinning the wheels at different speeds, which works surprisingly well (see Figure 3).

Robot chassis
Figure 3. Robot chassis. Image courtesy of Lukas Biewald.

The one place I spent more money when cheaper options were available is the Adafruit motor hat (see Figure 4). The DC motors run at a higher current than the Raspberry Pi can provide, so a separate controller is necessary, and the Adafruit motor hat is super convenient. Using the motor hat required a tiny bit of soldering, but the hardware is extremely forgiving, and Adafruit provides a nice library and tutorial to control the motors over i2C. Initially, I used cheaper motor controllers, but I accidentally fried my Pi, so I decided to order a better quality replacement.

Raspberry Pi with motor hat and camera
Figure 4. Raspberry Pi with motor hat and camera. Image courtesy of Lukas Biewald.

A $15 camera attaches right into the Raspberry Pi and provides a real-time video feed I can use to recognize objects. There are tons of awesome cameras available. I like the infrared cameras that offer night vision.

The Raspberry Pi needs about 2 amps of current, but 3 amps is safer with the speaker we’re going to plug into it. iPhone battery chargers work awesomely for this task. Small chargers don’t actually output enough amps and can cause problems, but the Lumsing power bank works great and costs $18.

A couple of HC-SR04 sonar sensors help the robot avoid crashing into things—you can buy five for $11.

I added the cheapest USB speakers I could find, and used a bunch of zip ties, hot glue, and foam board to keep everything together. As an added bonus, I cut up some of the packaging materials the electronics came with and drew on them to give the robots some personality. I should note here that I actually built two robots (see Figure 5) because I was experimenting with different chassis, cameras, sonar placement, software, and so forth, and ended up buying enough parts for two versions.

My 4WD robot and her 2WD older brother
Figure 5. My 4WD robot (right) and his 2WD older sister. Image courtesy of Lukas Biewald.

Once the robot is assembled, it’s time to make it smart. There are a milliontutorials for getting started with a Raspberry Pi online. If you’ve used Linux, everything should be very familiar.

For streaming the camera, the RPi Cam Web interface works great. It’s super configurable and by default puts the latest image from the camera in a RAM disk at /dev/shm/mjpeg/cam.jpg.

If you want to stream the camera data to a webpage (very useful for debugging), you can install Nginx, an extremely fast open source webserver/proxy. I configured Nginx to pass requests for the camera image directly to the file location and everything else to my webserver.

http {
   server {
      location / {
            proxy_pass http://unix:/home/pi/drive.sock;
         }
            location /cam.jpg {
                root /dev/shm/mjpeg;
         }
   }
}

I then built a simple Python webserver to spin the wheels of the robot based on keyboard commands that made for a nifty remote control car.

As a side note, it’s fun to play with the sonar and the driving system to build a car that can maneuver around obstacles.

Programming my robot

Finally, it’s time to install TensorFlow. There are a couple of ways to do the installation, but TensorFlow actually comes with a makefile that lets you build it right on the system. The steps take a few hours and have quite a few dependencies, but they worked great for me.

TensorFlow comes with a prebuilt model called “inception” that performs object recognition. You can follow the tutorial to get it running.

Running tensorflow/contrib/pi_examples/label_image/gen/bin/label_image on an image from the camera will output the top five guesses. The model works surprisingly well on a wide range of inputs, but it’s clearly missing an accurate “prior,” or a sense of what things it’s likely to see, and there are quite a lot of objects missing from the training data. For example, it consistently recognizes my laptop, even at funny angles, but if I point it at my basket of loose wires it consistently decides that it’s looking at a toaster. If the camera is blocked and it gets a dark or blurry image it usually decides that it’s looking at nematodes—clearly an artifact of the data it was trained on.

Robot plugged in
Figure 6. Robot plugged into my keyboard and monitor. Image courtesy of Lukas Biewald.

Finally, I connected the output to the Flite open source software package that does text to speech, so the robot can tell everyone what it’s seeing (see Figure 6).

Testing my robot

Here are my two homemade robots running deep learning to do object recognition.

Final thoughts

From 2003 to 2005, I worked in the Stanford Robotics lab, where the robots cost hundreds of thousands of dollars and couldn’t perform object recognition nearly as well as my robots. I’m excited to put this software on my drone and never have to look for my keys again.

I’d also like to acknowledge all the people that helped with this fun project. My neighbors, Chris Van Dyke and Shruti Gandhi, helped give the robot a friendly personality. My friend, Ed McCullough, dramatically improved the hardware design and taught me the value of hot glue and foam board. Pete Warden, who works at Google, helped get TensorFlow compiling properly on the Raspberry Pi and provided amazing customer support.

Article image: Eye of Providence. (source: Bureau of Engraving and Printing on Wikimedia Commons).


LEARN HOW THE COMMUNITY CREATED A POCKETC.H.I.P. CELL PHONE

$
0
0

LEARN HOW THE COMMUNITY CREATED A POCKETC.H.I.P. CELL PHONE

Tony using a PocketC.H.I.P. cell phone that Dave built

Pocketeers Juve021 and Rob Baruch figured out how to turn PocketC.H.I.P. into a portable cellular device, and they wrote two great tutorials explaining how you too can build the project.

While we love creating PocketC.H.I.P. projects in-house like Jose’s speaker hack,PockulusC.H.I.P., and emulating Apple’s System 7, it’s extra exciting to see community members developing and sharing what they’ve done.

A great place to share your projects and ideas is in our forums. You’ll find daily posts, discussions, and tips on how to get the most out of your C.H.I.P. and PocketC.H.I.P.. And if you’re at a loss for what your first PocketC.H.I.P. project should be, it’s a goldmine for inspiration and full of friendly folks happy to help out. It’s a resource not to be missed!

POCKETC.H.I.P. & ADAFRUIT’S FONA 808 CELLULAR MODULE

fonaWorking independently, juve021 and Rob Baruch successfully configured PocketC.H.I.P. to work with the popular Adafruit FONA 808 cellular module. They were both able to make phone calls, send SMS, and Baruch even got celluar data (GPRS) working!

The FONA 808 module is an easy to solder breakout board that has an excellent tutorialand tons of documentation. Though the tutorial examples use an Arduino, much of it is still applicable to PocketC.H.I.P., especially when you combine it with what juve021 andBaruch have written.

Just make sure when you order an 808 you get a SIM card with your purchase. You’ll need one to connect to a cellular network.


HARDWARE SETUP

Another shot of the FONA 808 wired to PocketC.H.I.P.

Juve021 soldered the cellular module directly to the exposed headers on PocketC.H.I.P. with short strands of wire. Check out his forum post for details. Or if you don’t want to solder to PocketC.H.I.P., you can use a USB-to-Serial cable like Baruch’s approach. Either approach will work.


SOFTWARE SETUP

Using PocketC.H.I.P. to send AT commands to the FONA 808

On the software side of the project, both juve021 and Baruch used the command-line program screen to send AT commands between PocketC.H.I.P. and the cell module. These commands dictate to the module what tasks it should perform and what numbers to call or send data to.

In the image above, you can see that juve021 used the AT+CSQ command to check the signal strength of the cellular connection and the AT+CMGS=”PHONE_NUMBER”command to send an SMS to a specific phone number.

Voice calls are made in a similar way. Type in the appropriate AT command to start a call, chat with your friend, and then type a few more commands when you want to end the call. The commands are a bit cryptic, but you can find out the specifics in Juve021’sforum post. And get excited, he’s working on a python script to automate much of the AT command input.


CELLULAR DATA SETUP

Rob Baruch took cellular a step further by figuring out how to use cellular data with the FONA 808. This is a bit more challenging to configure, since you’ll need to recompile the Linux kernel with support for Point to Point Protocol (PPP). Helpfully, Baruch has written a tutorial on how to enable PPP in the kernel and then connect the FONA 808 to a cellular data network.

If you’re new to compiling and deploying the Linux kernel, read over the instructions a few times so you’re familiar with the process. If you’re unsure of the process, consult thekernel compiling thread in the forums and post any lingering questions.

When using a custom kernel, it’s always a good idea to backup any important files you have on PocketC.H.I.P.. Using a compression tool to reduce the size of your backup is another good idea. A compressed archive will be faster to transfer to another computer since it’s smaller. To get the archive from PocketC.H.I.P. to your laptop, use SCP or a similar file transfer utility. Then, roll up your sleeves for some Linux fun!

And, if anything does go wrong, you can always use the online flasher to get your PocketC.H.I.P. back to the stock software image.


Best React Native apps to date

$
0
0

With its rise in popularity, it was only a matter of time before apps built with React Native emerged in app stores. React Native is used on a wide scale, from Fortune 500 companies to hot startups, on both iOS and Android platforms. Here’s the list of the best React Native apps to date.

In the first part of React Native blog post series we covered the history of React Native, so be sure to check it if you still haven’t.

Ads Manager app is the first full React Native cross-platform app. Creating and managing Facebook campaigns can be a horror story for your marketing team, as it sometimes requires a lot of time, nerves, and patience working with the desktop version of Ads Manager, and results in a temporary “mental breakdown.” On the other hand, Facebook Ads Manager app is a completely different experience.

The first thing you’ll notice is that the app is lightning fast, regardless of the actions you want to perform; from checking the status of a current campaign to creating a new one, all it takes is a second or two at most to navigate to the next step or access the data. The same goes for more complicated actions, like account switching: the completely different set of data is loaded instantly.

From a design standpoint, the interface is clean with intuitive UX and simple navigation. The animations and transitions are flawless; they do not feel unnatural or buggy at any point. The overall experience is magnificent, and if your marketing team isn’t using the app, we strongly advise them to start.

If you never experienced the power of VR, Discovery VR will fill that gap perfectly. The app brings heart-pounding adventures in a way that you’ve never experienced before, either via VR or 360 videos. Complete UI in the app has been written in React Native, which allows the developers to embed gyroscope and player using the native APIs, demonstrating the true power of React Native.

Discord is another brilliant example of the power of React Native and the type of performance the app can achieve, while sharing 98% of the code in iOS and web app. The app’s performances are incredible; switching between teams and loading channels with a long history of conversations is done in a blink of an eye. Switching from direct calls to voice channels and vice versa is as smooth as it should be.

If you have always dreamt of being an NFL coach, CBS Sport Franchise Football is the perfect app for you. The app brilliantly simulates the real life experience of NFL coaches: you need to demonstrate your coaching skills throughout the season, manage the player rosters, create new playbooks for your team, analyze your team performances, assign bonuses, and much more.

We loved the onboarding process, and the app guides you through Season 1 to familiarize you with the game and teach you everything you can, and must, do as a coach. But it doesn’t stop there; once you unlock the new options, the app highlights them to minimize your chances of missing them.

Gyroscope allows you to see the complete story of your life; it’s the health app on steroids. Not only can you track steps, your workout, or your heart rate, but with the dozens of integrations you can also track activities like productivity on the computer, or use sleep tracker and automatic Ai to make sure you get enough sleep.

All the data is displayed in two lovely, well-designed views: Simple or Cards mode. All tracked data is aggregated in daily/weekly/monthly reports, and you can easily deep-dive into it and decide on which things you want to focus next.

Myntra is a perfect example of how a shopping app should look and feel in order to make shopping from your smartphone as easy as possible. Once the app is opened, you just need to pick your interest and the app will populate the Home tab with similar categories or products. It’s a nice way to keep users engaged if they want to discover something new.

We loved the incredible speed of collection loading, the smooth interface, and intuitive UX. For example, if your bag is empty, the app will display a button “Go to wish list,” and if your wish list is also empty, a “Start shopping” button will be displayed — a nice way to lead users towards engagement.

Refinery29 provides a unique way to consume the latest news and top stories. The app will serve just 8 short, fun-to-read cards on a daily basis. We loved the whole concept of the app: it requires minimal engagement on the user side; you can stay updated with the latest news within 2 or 20 minutes; and it’s designed in a completely different way than any other news app.

Sneat takes restaurant apps to the next level for all foodies. Not only does the app provide a unique and simple one-click booking system for restaurants, but its navigation, maps, and payment form are elegantly designed. We hope the app will either extend its restaurant list from Paris-only to other locations, or that similar apps will emerge.

Townske aims to be your travel inspiration city guide on your next trip. The app connects you with locals to get a list of their favorite places and creates a curated list of places to explore and experience as locals do. It’s not mandatory for users to have an account, which is great, as it allows you to quickly find the next location you want to visit.

Imagine that you have low Wi-Fi connectivity, or that your battery is running low — in these cases, it’s a neat feature to have. We loved the design transitions and animations from a list view to a specific guide, as well as the “Save to a list” feature.

AIGA Design Conference 2015 app is a brilliant example of a conference app; not only does it aggregate a detailed schedule of all events, it also offers an expert local guide to New Orleans. We are amazed by the app’s design: including slick animations (sticky section titles), a minimalist map with expandable items, and nice view transitions. The combination of great design and speed results in a perfect conference app.

Infographic of the best React Native apps to date.

Think we missed some brilliant React Native apps? Let us know and add some of your favorite React Native apps in the comments.


Thank you for reading. If you like the article, please click the little empty ❤ below and make it green :)


Who are Shoutem and Five? Well, we are all about Mobile Apps. Shoutem is the leading app maker platform for building apps, and Five is a Design Driven App Development Company with offices in Europe and NYC.

This article was originally published at blog.shoutem.com by Robert Sekulić.


干货分享 |深度学习零基础进阶大法

$
0
0

编者按:新手上路都会有一个疑问,如果自己没有相关基础,如何学习晦涩的专业知识?此前雷锋网编译了《从0到1:我是如何在一年内无师自通机器学习的?》,这篇文章讲述了 Per Harald Borgen 的自学历程。而关于深度学习,GitHub的 songrotek 同样有话要说。原文名为《Deep Learning Papers Reading Roadmap》,雷锋网奕欣及老吕IO整理编译,未经许可不得转载。

0. 深度学习的“圣经”

提到入门级的书,就不得不提这一本 Bengio Yoshua,Ian J. Goodfellow 和 Aaron Courville共同撰写的《深度学习》(Deep Learning)。

“这本关于深度学习的教课书是一本为了帮助学生及从业者入门机器学习,并专注于深度学习领域的教材。”值得一提的是,这本 MIT 出版的“书”数年来一直在网上实时更新和完善,不断补充研究成果和新的参考文献,也向公众开放评论,接受修改意见,其火爆程度甚至被誉为深度学习的“圣经”。 目前该书可在亚马逊预定,今年年底就会送到你手上。

《深度学习》阅读网址:http://www.deeplearningbook.org/

1. 调研

Yann LeCun , Yoshua Bengio和Geoffrey Hinton被作者誉为深度学习界三大天王,他们所发布在 Nature上的“Deep Learning”包含了大量的研究和调查,五星推荐,值得一读!

[1] http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf

2. 建立深度学习的知识网

作为 AI 领袖级人物,Geoffrey Hinton 目前就职于谷歌,而其与E., Simon Osindero和Yee-Whye The的代表作《A fast learning algorithm for deep belief nets》更是被奉为圭臬,不妨看看。

[2] http://www.cs.toronto.edu/~hinton/absps/ncfast.pdf

此外,他还有一篇署名第一作者的《Reducing the dimensionality of data with neural networks》,可以说是深度学习的里程碑之作。

[3] http://www.cs.toronto.edu/~hinton/science.pdf

3. ImageNet 革命

当你读完了上面的几篇论文,相信你对深度学习也有了一个大致的了解。那么深度学习的突破点在哪呢?在 2012 年,Krizhevsky 的《Imagenet classification with deep convolutional neural networks》预示着神经网络的出现和发展有了突破性的研究进展。来不及了,赶紧上车吧,推荐指数五颗星。

[4] http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

而深度对于网络有多重要?《Very deep convolutional networks for large-scale image recognition》是牛津大学视觉几何组(VGG)Karen Simonyan 和 Andrew Zisserman 于 2014 年撰写的论文,主要探讨了深度对于网络的重要性;并建立了一个 19层的深度网络并获得了很好的结果。该论文在 ILSVRC上定位第一,分类第二。

[5] https://arxiv.org/pdf/1409.1556.pdf

如果想要了解下神经网络结构是如何改进的,那一定得读下这篇。Szegedy 和 Christian 都是当代著名的计算机科学家,他们曾在 2015 年合写了《Going deeper with convolutions》,这篇论文是为 ImageNet2014 的比赛而作,论文中的方法获得了比赛的第一名,包括 task1 分类任务和 task2 检测任务。本文主要关注针对计算机视觉的高效深度神经网络结构,通过改进神经网络的结构达到不增加计算资源需求的前提下提高网络的深度,从而达到提高效果的目的。

[6] http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

在第六届 ImageNet 年度图像识别测试中,微软研究院的计算机图像识别系统在几个类别的测试中拔得头筹,击败了谷歌、英特尔、高通、腾讯以及一些创业公司和学术实验室的系统。微软的获胜系统名为“图像识别的深度残差学习”(Deep Residual Learning for Image Recognition),由微软研究员何恺明、张祥雨、任少卿和孙剑组成的团队开发。因此,记录这一团队系统开发心得的《Deep Residual Learning for Image Recognition》绝对是学习必备啊,五星推荐。

[7] https://arxiv.org/pdf/1512.03385.pdf

4. 语音识别大法好

Hinton 与 Geoffrey 等技术专家合著的《Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups》是语音识别领域的巨大突破。它融合了四个小组利用深度神经网络和声学建模完成语音识别的实例。

[8] http://cs224d.stanford.edu/papers/maas_paper.pdf

除了上面的几篇论文,Geoffrey Hinton 大神 在《Speech recognition with deep recurrent neural networks》一文中也是思如泉涌,他向我们介绍了深度循环神经网络(RNNs)在语音识别中的重要性。

[9] https://arxiv.org/pdf/1303.5778.pdf

想必我们对语音输入并不陌生,但这是如何实现的呢?这篇名为《Towards End-To-End Speech Recognition with Recurrent Neural Networks》由 Graves、Alex 和多伦多大学教授 Navdeep Jaitly 共同撰写。它向我们描述了一个无需中继语音重构的音频转文字识别系统。

[10] http://www.jmlr.org/proceedings/papers/v32/graves14.pdf

如果你要问谷歌语音识别系统之源是什么,那我一定会向你推荐这篇名为《Fast and accurate recurrent neural network acoustic models for speech recognition》的论文由 Sak 和 Hasim 等多位专家撰写而成,它是谷歌语音识别系统的重要理论基础之一。

[11] https://arxiv.org/pdf/1507.06947.pdf

百度近日公布了其硅谷人工智能实验室(SVAIL)的一项新的研究成果,被称为 Deep Speech 2。Deep Speech 通过使用一个单一的学习算法实现了准确识别英语和汉语的能力。这一成果就发表在论文《Deep speech 2: End-to-end speech recognition in english and mandarin》之中。

[12] https://arxiv.org/pdf/1512.02595.pdf

本月 18 日,微软人工智能与研究部门的研究员和工程师发表了一篇名为《Achieving Human Parity in Conversational Speech Recognition》的论文。论文表明,微软的对话语音识别技术在产业标准 Switchboard 语音识别基准测试中实现了词错率(word error rate, 简称WER)低至 5.9% 的好成绩,首次达成与人类专业速记员持平,并且要优于绝大多数人的表现。雷锋网(公众号:雷锋网)此前也有提及,详情可点击原文查看。同时,也刷新了自己的一个月前创造的 6.3% 的记录。微软首席语音科学家黄学东是这一研究的参与者之一。

[13] https://arxiv.org/pdf/1610.05256v1.pdf

读完了上面推荐的论文,你一定对深度学习的历史有了一个基本了解,其基本的模型架构(CNN/RNN/LSTM)与深度学习如何应用在图片和语音识别上肯定也不在话下了。下一部分,我们将通过新一批论文,让你对深度学习的方式与深度学习在不同领域的运用有个清晰的了解。由于第二部分的论文开始向细化方向延展,因此你可以根据自己的研究方向酌情进行选择。

推荐阅读:

微软研究院新成果!对话语音识别水平超人类,错误率仅为 5.9 %

从0到1:我是如何在一年内无师自通机器学习的?


GBDX

$
0
0

GBDX

The GBDX platform provides cloud-based access to DigitalGlobe’s vast current and historical library of geospatial data along with the tools and algorithms necessary to extract useful information from that data — at scale! This creates the ideal ecosystem for you to create new customer solutions without the cost of owning and operating costly data and IT infrastructure. Using the platform you can

  • Search, access, and process imagery in a manner that allows for rapid geospatial information product creation at any scale
  • Build new applications, or extend existing ones by leveraging the GBDX capabilities and embedding them in customer-facing interfaces

Infrastructure designed to scale to your needs

GBDX uses the cloud infrastructure of Amazon Web Service (AWS) to enable a set of APIs that can perform scaleable geo-compute against imagery data. The GBDX environment enables simple access to both storage and computation in a manner that is easily managed. GBDX is bringing the compute to the data rather than the data to the compute. Data residing within GBDX is stored in S3 ‘buckets’ that are accessed through GBDX RESTful (REST) web service APIs. The GBDX Workflow API enables you to access state of the art computer vision and remote sensing algorithms from DigitalGlobe and ENVI, dynamically increasing computation power.  If you don’t like the built in tools, thats fine, just import your own algorithms into the system to allow them to run at scale.

Complete access to 15 years of earth imagery

DigitalGlobe owns and operates the most agile and sophisticated constellation of commercial earth imaging satellites in the world which collect 3,000,000 km2 of Earth imagery every day. Of all high-resolution commercial imagery collected since 2010, DigitalGlobe has collected approximately 80% of it.

DigitalGlobe imagery collection over 90 days
We collect 3,000,000 sq/km of Earth imagery every day

 

The environment and algorithms you need to extract actionable information

GBDX is a robust environment for building, accessing and running advanced algorithms designed for information extraction from imagery datasets at scale. Accessible through our APIs, current algorithm examples include car counting, orthorectifying, land use, land cover, and atmospheric compensation. These algorithms are being developed by both DigitalGlobe and 3rd party developers.

REST APIs

REST APIs access and control the various elements of the GBDX environment, including data discovery, staging of working sets, workflow orchestration, etc. Actions are performed by exchanging representations in JSON format. The platform’s API uses the standard HTTP request methods: GET, POST and DELETE.

Catalog API

The catalog API provides for the search and discovery of data via a set of 39 different attributes associated with each image. These attributes include standard image characteristics such as geographic location, cloud cover percentage, sensor type, sun angle and many others. Through the Catalog API, you can quickly find the data set you need for further processing.

Workflow API

The Workflow API contains tools for managing imagery data and executing tasks against sets of data. Using the workflow you first find data through the use of the catalog API, the assign the selected dataset to a working set, and then launch individual tasks/algorithms or a chained set of tasks/algorithms.

Docker

GBDX relies on Docker for the deployment of algorithms and applications as tasks within the system. Developers who desire the ability to run algorithms at scale through the GBDX environment must containerize their capabilities into a standard, uniform interface through Docker. Docker The Algorithm Interface to GBDXDocker fits neatly into the ecosystem of the catalog APIs, workflow APIs, Docker hub and the GBDX developer community.

Connections between Docker, Catalog APIs, Workflow APIs
Connections between Docker, Catalog APIs, Workflow APIs and the DigitalGlobe factory create a robust environment for GBDX developers

 

Support, when you need it

Behind the GBDX infrastructure, data and tools are a team of experts to support the technology from DigitalGlobe’s operations and solutions teams, giving you access to our decades of experience and knowledge building geospatial imagery applications.

Ecosystem

The GBDX platform enables developers and partners the ability to create workflows and algorithms that can be shared or sold. Currently the system supports a number of services from 3rd party partners.

GBDX provides unprecedented power at reasonable cost

  • Access to algorithms, image data and compute infrastructure coupled with new business models allow for broad scale geo-analysis
  • Developer friendly pricing enables the creation of new products and services
  • Short term image data rental approach enables project execution at a low cost
  • Ecosystem enables access to a broad selection of industry tools to advance product and project goals and also enables opportunities for new passive revenue streams

See it in action

https://fast.wistia.net/embed/iframe/74l14cagj7

Get it now!

We offer three different ways to work with the GBDX platform

GBDX Evaluation Tier

  • Includes access to imagery data within two geographical areas
  • Use of our Web Application or GBDX APIs to order and process data from these two areas

>>More information
>>Register for Evaluation Tier Access

GBDX Developer Access

  • Includes a R&D license for creation of new products and algorithms
  • Derived products cannot be sold to end customers
  • Developer-friendly pricing

>> Register for developer access

Production Access

  • Large scale extractions incorporated into developed product or for internal use
  • Payment is done on a revenue share or cost/TB month

Call Us
(866) 308-4120 Americas
+44 203-695-2291 Europe, Middle East, Africa, Russia
+65 3158-5033 Asia, Pacific


First proof-of-concept: A GPS Fix in Windows using an RTL-SDR stick

$
0
0

First proof-of-concept: A GPS Fix in Windows using an RTL-SDR stick

Using a $20 RTL-SDR stick with 1ppm TXCO and a simple mod to power an active GPS antenna, it is possible to download and decode GPS signals in real time.
Software
Both packages are Open Source, with a default build target is Windows.

Both packages are Open Source (GPL2 and BSD 2-clause, respectively). The default build target for both packages is Windows, although RTKLIB has been compiled under Linux.

Please note that GNSS-SDRLIB is not to be confused with GNSS-SDR!
Resources
Useful presentation by the author of GNSS-SDRGUI for a summer school course. Also check the manuals included with GNSS-SDRLIB and RTKLIB.
Step-By-Step Implementation
  1. Ensure RTL-SDR stick is working in Windows. If your driver is not working, try using the Zadig driver installation method outlined here.
  2. Install GNSS-SDRLIB and RTKLIB to any convenient directory
  3. Open GNSS-SDRGUI and select the following options
    1. Input Type: RTL-SDR
    2. [x] RTCM MSM, Port 9999
    3. Change “output interval” dropdown to 10 hz
    4. [x] Plot Tracking
    5. [x] All GPS, GLOSNASS, Galileo satellites
    6. (optional) enter approximate lat/lon into MISC and click the “…” button to get  current satellite locations in relation to your location.
  4. Click “Start”, a number of command consoles will open then close for each satellite being tracked.
  5. Click “M” for log
  6. Now, open RTKNAVI
    1. Click on the “I” button
      1. check “rover”, type TCP Client, format RTCM3
      2. click OPT button and set address to “localhost” and port to “9999”
      3. click OK
      4. Click OK
    2. Click on the “start” button
    3. Within a few seconds you should see satellites in the Rover:Base SNR pane
    4. Once a solution exists it will update lat/lon in the left pane
    5. Click “Plot” to generate a plot of the random walk of lat/lon over time
This is what your GNSS-SDRGUI should look like:
This is what your RTKNAVI input should look like:
If everything is working you should get a GPS solution, as shown in this video
Next Actions
Try using GNSS-SDR and/or GNURadio to decode the GNSS signal; this would provide native, Linux compatible headless execution and a cursory Google suggests this has been done successfully with the RTL-SDR stick.

干货分享 |深度学习零基础进阶第二弹

$
0
0

昨天,雷锋网编译了《干货分享 | 深度学习零基础进阶大法!》,相信读者一定对深度学习的历史有了一个基本了解,其基本的模型架构(CNN/RNN/LSTM)与深度学习如何应用在图片和语音识别上肯定也不在话下了。今天这一部分,我们将通过新一批论文,让你对深度学习的方式与深度学习在不同领域的运用有个清晰的了解。由于第二部分的论文开始向细化方向延展,因此你可以根据自己的研究方向酌情进行选择。本文对每篇论文都增加了补充介绍,分上下两篇,由老吕IO及奕欣编译整理,未经雷锋网(公众号:雷锋网)许可不得转载。

1.深度学习模型

Hinton 与 Geoffrey 等技术专家合著的《Improving neural networks by preventing co-adaptation of feature detectors》也很有指导意义。论文提出,在训练神经网络模型时,如果训练样本较少,为了防止模型过拟合,Dropout 可以作为一种 trikc 供选择。

[1] https://arxiv.org/pdf/1207.0580.pdf

关于 Dropout,Srivastava 与 Nitish 等技术专家也合著过《Dropout: a simple way to prevent neural networks from overfitting》一文。论文提出,拥有大量参数的深度神经网络是性能极其强大的机器学习系统,但过度拟合问题却成了系统中难以解决的一个大问题,而 Dropout 是处理这一问题的技术捷径。

[2] http://www.jmlr.org/papers/volume15/srivastava14a.old/source/srivastava14a.pdf

深度神经网络的训练是个复杂异常的活,因为训练中每一层参数的更改都会牵一发而动全身,而这一问题就造成训练效率低下。Ioffe、 Sergey 和 Christian Szegedy在《Batch normalization: Accelerating deep network training by reducing internal covariate shift》一文中着重介绍了解决这一问题的关键:内部协变量的转变。

[3] https://arxiv.org/pdf/1502.03167.pdf

深度神经网络的训练非常考验计算能力,而要想缩短训练时间,就必须让神经元的活动正常化,而最新引入的“批规范化”技术则是解决这一问题的突破口。完成技术突破的技术方式纠缠在多位专家合著的这份名为《Layer normalization》的论文中。

[4] https://arxiv.org/pdf/1607.06450.pdf?utm_source=sciontist.com&utm_medium=refer&utm_campaign=promote

《Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1》是今年2月份刚刚出炉的论文,论文的主要思想是通过二值化weights和activations,来提高NN的速度和减少其内存占用。由于二值网络只是将网络的参数和激活值二值化,并没有改变网络的结构,因此我们要关注如何二值化,以及二值化后参数如何更新。

[5] https://pdfs.semanticscholar.org/f832/b16cb367802609d91d400085eb87d630212a.pdf

《Decoupled neural interfaces using synthetic gradients》是一篇来自Google DeepMind很有意思的神经网络论文,论文中用合成的梯度来分解backprop中的关联关系,五星推荐。

[6] https://arxiv.org/pdf/1608.05343.pdf

2. 深度学习优化

《On the importance of initialization and momentum in deep learning》一文介绍了初始化和Momentum技术在深度学习方面的重要性,更多的着眼在实验分析上。

[7] http://www.jmlr.org/proceedings/papers/v28/sutskever13.pdf

Adam是一种基于梯度的优化方法,与SDG类似。其具体信息可以参阅论文《Adam: A method for stochastic optimization》。

[8] https://arxiv.org/pdf/1412.6980.pdf

《Learning to learn by gradient descent by gradient descent》由 Andrychowicz 和 Marcin 等专家撰写而成,本文的思想是利用LSTM学习神经网络的更新策略,即利用梯度下降法学习一个优化器,然后用这个优化器去优化其他网络的参数。该文指导意义颇强,五星推荐。

[9] https://arxiv.org/pdf/1606.04474.pdf

斯坦福大学的 Song Han 与 Huizi Mao 等专家撰写了一系列有关网络压缩的论文,《Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding》是其中一篇,论文题目已经概括了文中的三个重点,非常清晰明了。同时它也荣获了 ICLR 2016 最佳论文,五星推荐。

[10] https://pdfs.semanticscholar.org/5b6c/9dda1d88095fa4aac1507348e498a1f2e863.pdf

《SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size》由 Iandola 和 Forrest N 等专家撰写,开头论文先提了在相同精确度下,体积更小的深度神经网络有着3点好处。随后,提出了本文的创新 SqueezeNet 并给出了一个分类精度接近 AlexNet1 的网络,模型缩小 510 倍,还归纳了缩小模型尺寸时的设计思路。

[11] https://arxiv.org/pdf/1602.07360.pdf

3. 无监督学习/深层生成模型

《Building high-level features using large scale unsupervised learning》讲述了 Google Brain 中特征学习的原理,通过使用未标记的图像学习人脸、猫脸特征,得到检测器。文章使用大数据构建了一个9层的局部连接稀疏自编码网络,使用模型并行化和异步 SGD 在 1000 个机器(16000核)上训练了 3 天,实验结果显示可以在未标记图像是否有人脸的情况下训练出一个人脸检测器。

[12] https://arxiv.org/pdf/1112.6209.pdf&embed

Kingma、 Diederik P 和 Max Welling 三位专家共同撰写了《Auto-encoding variational bayes》,该论文提出一个融合 Variational Bayes 方法和神经网络的方法,这个方法可以用来构造生成模型的自编码器。

[13] https://arxiv.org/pdf/1312.6114.pdf

《Generative adversarial nets》是 Ian Goodfellow 大神的 2014 年的论文,中文应该叫做对抗网络,在许多教程中作为非监督深度学习的代表作给予推广。本文解决了非监督学习中的著名问题:给定一批样本,训练一个系统,能够生成类似的新样本。五星推荐。

[14] http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

《Unsupervised representation learning with deep convolutional generative adversarial networks》是在 GAN 的论文中提出的对抗模型的原型,本文给出了基于卷机网的实现。同时还描述了实现过程中的细节,比如参数设置。也提到了解决 GAN 中训练不稳定的措施,但是并非完全解决。文中还提到利用对抗生成网络来做半监督学习。在训练结束后,识别网络可以用来提取图片特征,输入有标签的训练图片,可以将卷基层的输出特征作为 X ,标签作为 Y 做训练。

[15] https://arxiv.org/pdf/1511.06434.pdf

《DRAW: A recurrent neural network for image generation》来自谷歌,描述了如何用 Deep Recurrent Attentive Writer (DRAW)神经网络框架自动生成图像,五星推荐。

[16] http://jmlr.org/proceedings/papers/v37/gregor15.pdf

Pixel recurrent neural networks》是谷歌 ICML 获奖论文,它解释了像素递归神经网络是如何帮图片“极致”建模的。在这篇文章中,作者在深度递归网络下建立了对自然图片的通用建模并显著提升了它的效率。此外,作者提出了一种新颖的二维 LSTM 层:ROW LSTM和 Diagonal BiLSTM,它能更容易扩展到其他数据上。

[17] https://arxiv.org/pdf/1601.06759.pdf

《Conditional Image Generation with PixelCNN Decoders》来自谷歌DeepMind团队。他们研究一种基于PixelCNN(像素卷积神经网络)架构的模型,可以根据条件的变化生成新的图像。如果该模型输入ImageNet图像库的分类标签照片,该模型能生成多变的真实场景的照片,比如动物、风景等。如果该模型输入其他卷积神经生成的未见过的人脸照片,该模型能生成同一个人的不同表情、姿势的照片。

[18] https://arxiv.org/pdf/1606.05328.pdf

推荐阅读:

干货分享 | 深度学习零基础进阶大法!


干货分享|深度学习零基础进阶第三弹​

$
0
0

雷锋网曾编译《干货分享 | 深度学习零基础进阶大法!》,相信读者一定对深度学习的历史有了一个基本了解,其基本的模型架构(CNN/RNN/LSTM)与深度学习如何应用在图片和语音识别上肯定也不在话下了。今天这一部分,我们将通过新一批论文,让你对深度学习的方式与深度学习在不同领域的运用有个清晰的了解。由于第二部分的论文开始向细化方向延展,因此你可以根据自己的研究方向酌情进行选择。雷锋网对每篇论文都增加了补充介绍,分上下两篇,由老吕IO及奕欣编译整理,未经雷锋网(公众号:雷锋网)许可不得转载。

4. 循环神经网络/序列到序列模式

《Generating sequences with recurrent neural networks》一文由 Graves 和 Alex 两位专家合力撰写,这篇论文解释了用递归神经网络生成手写体的原理。

[19] https://arxiv.org/pdf/1308.0850.pdf

《Learning phrase representations using RNN encoder-decoder for statistical machine translation》完成了将英文转译为法文的任务,使用了一个 encoder-decoder 模型,在 encoder 的 RNN 模型中是将序列转化为一个向量。在 decoder 中是将向量转化为输出序列,使用 encoder-decoder 能够加入词语与词语之间的顺序信息。此外,还将序列表达为一个向量,利用向量能够清楚的看出那些语义上相近的词聚集在一起。

[20] https://arxiv.org/pdf/1406.1078.pdf

《Sequence to sequence learning with neural networks》是谷歌的 I. Sutskever 等人提出的一种序列到序列的学习方法, 最直接的应用就是机器翻译。

[21] http://papers.nips.cc/paper/5346-information-based-learning-by-agents-in-unbounded-state-spaces.pdf

Attention 机制最早是在视觉图像领域提出来的,随后 Bahdanau 等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》中,使用类似 attention 的机制在机器翻译任务上将翻译和对齐同时进行,他们算是第一个提出将 attention 机制应用到 NLP 领域中的团队。

[22] https://arxiv.org/pdf/1409.0473v7.pdf

《A Neural Conversational Model》是最早应用于序列到序列框架建立对话模型的论文,即便其中使用的模型结构并不复杂,网络层数数量也不多,但效果是却很可观。

[23] https://arxiv.org/pdf/1506.05869.pdf

5.神经图灵机

《Neural turing machines》一文介绍了神经图灵机,一种从生物可行内存和数字计算机的启发产生的神经网络架构。如同传统的神经网络,这个架构也是可微的端对端的并且可以通过梯度下降进行训练。我们的实验展示了它有能力从样本数据中学习简单的算法并且能够将这些算法推广到更多的超越了训练样本本身的数据上。绝对的五星推荐。

[24] https://arxiv.org/pdf/1410.5401.pdf

神经图灵机是当前深度学习领域三大重要研究方向之一。论文《Reinforcement learning neural Turing machines》利用增强学习算法来对神经网络进行训练,从而使神经图灵机的界面变得表现力十足。

[25] https://pdfs.semanticscholar.org/f10e/071292d593fef939e6ef4a59baf0bb3a6c2b.pdf

《Memory networks》由四位专家撰写而成,实际上所谓的 Memory Network 是一个通用的框架而已,内部的输入映射、更新记忆映射、输出映射、响应映射都是可以更换的。

[26] https://arxiv.org/pdf/1410.3916.pdf

《End-to-end memory networks》在算法层面解决了让记忆网络端对端进行训练的问题,在应用方面则解决了问题回答和语言建模等问题。

[27] http://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf

《Pointer networks》中提出了一种新型的网络架构,用来学习从一个序列输入到一个序列输出的推导。跟以往的成果不同之处在于,输入输出的长度都是可变的,输出的长度跟输入有关。

[28] http://papers.nips.cc/paper/5866-pointer-networks.pdf

《Hybrid computing using a neural network with dynamic external memory》是谷歌 DeepMind 首发于《自然》杂志的论文,它介绍了一种记忆增强式的神经网络形式,其被称为可微神经计算机(differentiable neural computer),研究表明它可以学习使用记忆来回答有关复杂的结构化数据的问题,其中包括人工生成的故事、家族树、甚至伦敦地铁的地图。研究还表明它还能使用强化学习解决拼图游戏问题。五星推荐。

[29] https://www.dropbox.com/s/0a40xi702grx3dq/2016-graves.pdf

6. 深度强化学习

终于!我们来到了深度强化学习的门下。说到这个名词,怎么能不提第一篇提出深度强化学习的论文呢?Mnih 所写的《Playing atari with deep reinforcement learning》将卷积神经网络和 Q Learning 结合,使用同一个网络玩 Atari 2600(也就是打方块)这类只需要短时记忆的 7 种游戏。结果显示,这种算法无需人工提取特征,还能生成无限样本以实现监督训练。

[30] http://arxiv.org/pdf/1312.5602.pdf

而至于深度强化学习的里程碑之作,同样要属同一作者的《Human-level control through deep reinforcement learning》,作者发明了一个名为DQN也就是深度Q网络的东西,让人工神经网络能直接从传感器的输入数据中获得物体分类,成功实现端到端的强化学习算法从高维的传感器输入中直接学习到成功策略。

[31] http://www.davidqiu.com:8888/research/nature14236.pdf

而接下来这篇名为《Dueling network architectures for deep reinforcement learning》的文章则提出了一个新的网络——竞争架构网络。它包括状态价值函数和状态依存动作优势函数。这一架构在多种价值相似的动作面前能引发更好的政策评估。此文当选 ICML 2016最佳论文大奖。

[32] http://arxiv.org/pdf/1511.06581

《Asynchronous methods for deep reinforcement learning》由 DeepMind 出品,主要增强了 Atari 2600 的游戏效果,也被视为通过多个实例采集样本进行异步更新的经典案例。

[33] http://arxiv.org/pdf/1602.01783

比起传统的规划方法,《Continuous control with deep reinforcement learning》里提到的DQL方法能够应用于连续动作领域,鲁棒解决了  20 个仿真运动,采用的是基于ICML 2014的Deterministic policy gradient (DPG)的 actor-critic 算法,名为 DDPG。

[34] http://arxiv.org/pdf/1509.02971

《Continuous Deep Q-Learning with Model-based Acceleration》采用了 Advantage Function 完成增强学习工作,但主要集中于变量连续行动空间。而就像标题所言,为了加快机器经验获取,研究还用卡尔曼滤波器加局部线性模型。实验结果显示,这种方法比前一篇论文提及的 DDPG 要好些。

[35] http://arxiv.org/pdf/1603.00748

Schulman的《Trust region policy optimization》可谓是计算机玩游戏的一大突破,这个名为 TRPO 的算法所呈现的结果丝毫不逊色于 DeepMind 的研究成果,展示了一种广义的学习能力。除了叫机器人走路,我们还能让它成为游戏高手。

[36] http://www.jmlr.org/proceedings/papers/v37/schulman15.pdf

接下来介绍的这篇论文就是鼎鼎大名的 AlphaGo 所运用的算法,《Mastering the game of Go with deep neural networks and tree search》里,谷歌运用了 13 层的策略网络,让计算机学会用蒙特卡罗搜索树玩围棋游戏。当然,五星推荐此篇,不服来辩。

[37]  http://willamette.edu/~levenick/cs448/goNature.pdf

7. 无监督特征学习

《Deep Learning of Representations for Unsupervised and Transfer Learning》可谓无监督特征学习的开山之作。

[38] http://www.jmlr.org/proceedings/papers/v27/bengio12a/bengio12a.pdf

而接下来的这篇《Lifelong Machine Learning Systems: Beyond Learning Algorithms》主要提到的观点是,如果一个具有Lifelong Machine Learning能力的机器学习系统,是否能够使用解决此前问题的相关知识帮助它解决新遇到的问题,也就是举一反三的能力。文章在 2013 年的AAAI 春季研讨会上首次提出。

[39] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.696.7800&rep=rep1&type=pdf

人工智能教父又来了,他这次和 Dean 合作带来的是《Distilling the knowledge in a neural network》,也就是压缩神经网络。不过核心创新貌似不多,所以给个四星吧。

[40] http://arxiv.org/pdf/1503.02531

《Policy distillation》,文章由谷歌大神Andrei Alexandru Rusu 所写,同款文章还有 Parisotto 的《Actor-mimic: Deep multitask and transfer reinforcement learning》,都是在讲 RL 域的问题。

[41] http://arxiv.org/pdf/1511.0629

[42] http://arxiv.org/pdf/1511.06342

这里还有另外一篇 Andrei 的文章,名为《Progressive neural networks》,提出了一项名为“渐进式神经网络”的算法,即在仿真环境中训练机器学习,随后就能把知识迁移到真实环境中。无疑,这将大大加速机器人的学习速度。

[43] https://arxiv.org/pdf/1606.04671

8. 一步之遥

以下五篇论文虽然并不是完全针对深度学习而推荐,但包含的一些基本思想还是具有借鉴意义的。

《Human-level concept learning through probabilistic program induction》五星推荐,文章主要介绍了贝叶斯学习程序(BPL)框架,“如何依靠简单的例子来对新概念进行学习和加工,学习主体是人类。”

[44] http://clm.utexas.edu/compjclub/wp-content/uploads/2016/02/lake2015.pdf

而读读 Koch 的《Siamese Neural Networks for One-shot Image Recognition》和这篇《One-shot Learning with Memory-Augmented Neural Networks》着实很有必要。

[45] http://www.cs.utoronto.ca/~gkoch/files/msc-thesis.pdf

[46]http://arxiv.org/pdf/1605.06065

将重点放在大数据上的《Low-shot visual object recognition》则是走向图像识别的必要一步。

[47]http://arxiv.org/pdf/1606.02819

以上便是第二阶段值得一读的论文,敬请期待后续更新。

推荐阅读:

干货分享 | 深度学习零基础进阶大法!

干货分享 | 深度学习零基础进阶第二弹



5 EBooks to Read Before Getting into A Machine Learning Career

$
0
0

A carefully-curated list of 5 free ebooks to help you better understand the various aspects of what machine learning, and skills necessary for a career in the field.

Note that, while there are numerous machine learning ebooks available for free online, including many which are very well-known, I have opted to move past these “regulars” and seek out lesser-known and more niche options for readers.

Interested in a career in machine learning? Don’t know where to start? Well, there’s always here, a collection of tutorials on pursuing machine learning in the Python ecosystem. If you are looking for something more, you could lookhere for an overview of MOOCs and online lectures from freely-available university lectures.

Of course, nothing substitutes rigorous formal education, but let’s say that isn’t in the cards for whatever reason. Not all machine learning positions require a PhD; it really depends where on the machine learning spectrum one wants to fit in. Check out this motivating and inspirational post, the author of which went from little understanding of machine learning to actively and effectively utilizing techniques in their job within a year.

Looking to strike a balance between what you would learn in an introductory graduate school machine learning regimen and what you can get from online tutorials? As they have been for hundreds of years, books are a great place to turn.🙂 Of course, today we have instant access to freely-available digital books, which makes this a very attractive alternative. Have a look at the following free ebooks, all of which are appropriate for an introductory level of understanding, but which also cover a variety of different concepts and material.

1. Introduction to Machine Learning

Nils J. Nilsson of Stanford put these notes together in the mid 1990s. Before you turn up your nopse at the thought of learning from something from the 90s, remember that foundation is foundation, regardless of when it was written about.

Introduction to Machine LearningSure, many important advancements have been made in machine learning since this was put together, as Nilsson himself says, but these notes cover much of what is still considered relevant elementary material in a straightforward and focused manner. There are no diversions related to advancements of the past few decades, which authors often want to cover tangentially even in introductory texts. There is, however, a lot of information about statistical learning, learning theory, classification, and a variety of algorithms to whet your appetite. At < 200 pages, this can be read rather quickly.

2. Understanding Machine Learning: From Theory to Algorithms

Understanding Machine LearningThis book covering machine learning is written by Shai Shalev-Shwartz and Shai Ben-David. This book is newer, longer, and more advanced than the previous offering, but it is also a logical next step. This will delve deeper into more algorithms, their descriptions, and provide a bridge toward practicality as well. The focus on theory should be a clue to newcomers of its importance to really understand what is powering machine learning algorithms. The Advanced Theory section covers some concepts which may be beyond the scope or desire of a newcomer, but the option exists to have a look.

3. Bayesian Reasoning and Machine Learning

This introductory text on Bayesian machine learning is one of the most well-known on the topic as far as I am aware, and happens to have a free online version available. An Amazon review from Arindam Banerjee of the University of Minnesota has this to say:

Bayesian Machine LearningThe book has wide coverage of probabilistic machine learning, including discrete graphical models, Markov decision processes, latent variable models, Gaussian process, stochastic and deterministic inference, among others. The material is excellent for advanced undergraduate or introductory graduate course in graphical models, or probabilistic machine learning. The exposition throughout the book uses numerous diagrams and examples, and the book comes with an extensive software toolbox…

It should be noted that the toolbox being referred to is implemented in MATLAB, which is no longer the default machine learning implementation language, at least not generally. The toolbox is not the book’s only virtue, however.

This provides a great jumping off point for those interested in probabilistic machine learning.

4. Deep Learning

This is the soon-to-be-released-in-print deep learning book by Goodfellow, Bengio and Courville, which has a freely-available final draft copy on its official website.

DNN layersThe following 2 excerpts are from the book’s website, one providing an overview of its contents, the other putting almost everyone interested in reading the book at ease:

The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. The print version will be available for sale soon.

One of these target audiences is university students(undergraduate or graduate) learning about machine learning, including those who are beginning a career in deep learning and artificial intelligence research. The other target audience is software engineers who do not have a machine learning or statistics background, but want to rapidly acquire one and begin using deep learning in their product or platform.

You would be hard-pressed to find a better resource from which to learn all about deep learning.

5. Reinforcement Learning: An Introduction

Sutton and Barto’s authoritative classic is getting a makeover. This is a link to the second draft, which is currently in progress (and freely-available while it is).

Sutton & BartoReinforcement learning is of incredible research interest these days, and for good reason. Given its recent high-profile success as part of AlphaGo, its potential in self-driving cars and similar systems, and its marriage with deep learning, there is little reason to believe that reinforcement learning, which is undoubtedly to play a major role in any form of “General AI” (or anything resembling it), is going anywhere. Indeed, these are all reasons that a second draft of this book is in the works.

You can get a sense of the importance of this book in the field of reinforcement learning given that it is referred to simply as “Sutton and Barto.” This Amazon review from David Tan sums the book up nicely (and allays any fears related to “is it too complex for me to understand?”):

The book starts with examples and intuitive introduction and definition of reinforcement learning. It follows with 3 chapters on the 3 fundamental approaches to reinforcement learning: Dynamic programming, Monte Carlo and Temporal Difference methods. Subsequent chapters build on these methods to generalize to a whole spectrum of solutions and algorithms.

The book is very readable by average computer students. Possibly the only difficult one is chapter 8, which deals with some neural network concepts.

Do keep in mind the above is in regards to the first edition; it should generalize to the second, however.

I wish you well on your quest to learn more about machine learning from free ebooks. Check the related links below for ever more related ebook resources.

Related:


前端持续集成解决方案

$
0
0

前段时间读到一篇优秀的文章《前端开源项目持续集成三剑客》,就想试着运用到自己的项目中去。(好吧,老实说,我只是个徽章收集爱好者。)

持续集成

持续集成,这个概念对后端来说应该并不陌生,甚至可以说是司空见惯吧。但是,这对曾经(除了那些大厂)单元测试都不一定要写的前端来说,或许是个陌生的词。

然而,随着前端飞速地发展,不断吸取后端长久以来积累的经验,以及前端对单元测试越来越重视,持续集成作为前端工程化中的一项也渐渐进入人们的视野。

那么,持续集成究竟是什么?

持续集成(英语:Continuous integration,缩写为 CI),一种软件工程流程,将所有工程师对于软件的工作复本,每天集成数次到共用主线(mainline)上。 —— wikipedia

简单来说,就是以一定的频率将代码整合到一起。

使用持续集成能使项目:

  • 保持可测试和可发布的状态
  • 易于追踪错误,当集成产生错误时,能将错误产生的缩小范围到上次成功集成之后的提交
  • 版本回滚也变得轻而易举

Travis-CI vs CircleCI

《前端开源项目持续集成三剑客》中,作者推荐了 2 个集成工具,分别是:travis-cicircleci

额…该选哪个哪?

分别粗略地了解了这两个产品,它俩的网站的都非常简洁,文档也很清晰,功能上也大致相同。虽然,circleci 比 travis-ci 多了 Bitbucket 源码库的支持,但是,有一大硬伤 circleci 只对一个 container 免费,而且,若使用 OS X 需要额外收费。与之相反,travis-ci 只要是 Github 上的开源项目全部免费,且支持在 OS X 运行。

Travis-ci。

注册 travis 只需一步,点击 Sign In 按钮绑定 Github。登录后,执行 travis 只需以下 3 步:

  1. 添加需要 travis 管理的项目
  2. 为项目添加 .travis.yml 配置文件
  3. 提交代码

与此同时,travis 的配置也极其简单。如果没有什么特别的需求,那么,只需配置运行语言类型及其版本就行。

// .travis.yml
language: node_js
node_js:
  - "6"

这样,一个简单、可用的 travis 配置就完成了。

Travis 构建过程主要分为两步:

  • install:安装依赖,在 node 环境下,默认运行 npm install
  • stript:运行构建命令,在 node 环境下,默认运行 npm test

那么,上面的代码就等价于:

language: node_js
node_js:
  - "6"
install: npm install
script: npm test

当然,travis 不止这两个生命周期,额外的配置需求都可以到官网查看

OK。提交代码试试吧。

travis 的运行信息都可以在 Job log 中看到。

如果运行成功,你就可以通过 https://img.shields.io/travis…https://img.shields.io/travis… 来给你的项目添加 badge 了,就像这样

Tips:其中的 USER, REPO, BRANCH 都要替换成个人信息。

Codecov vs Coveralls

有了构建的徽章,接着再弄一个测试覆盖率的徽章。三剑客文章中用的是 coveralls,但进入它的官网发现,它和当今网站那种简洁风格不同,画风有点 classic 啊~文档也不太详细,比较简单,就查了下有没有其他更好的?

于是,发现了 codecov

干净、免费,我喜欢。

文档也相对于 coveralls 更清晰、详细。在尝试之后,更是觉得我的选择是明智的。^_^

codecov 的使用相当简单,甚至不用看文档就可以轻易配置。

首先,登录首页,根据自己源码的存储位置选择相应的登录按钮,这里我选择 Github,第一次登录会需要你的授权。

授权成功之后,就能看到类似下面的图,分别对应你的个人账户以及你所加入的组织。

第一次使用时,默认是没有 repository 的,需要通过点击 + Add my first repository 来添加需要 codecov 管理的 repository。

选择相应的 repository 之后,你可以看到一个类似下面的页面。当然,数据什么肯定是没有的。

前几个 tab 是用来展示信息的,在配置完成并运行之前是没有信息的,配置的时候只需要看最后一个 setting tab。

切换左侧的菜单,就能分别看到 setting 和 badge 的信息,是不是超级赞?

无论 codecov 还是 coveralls,它自身都不会去运行测试用例来获得项目代码的覆盖率,而是通过收集覆盖率报告及其他关键信息来静态分析。

codecov 可以接收 lcov, gcov 以及正确的 json 数据格式作为输入信息。

于是,如果你使用 JEST 作为测试框架,并开启测试覆盖率(collectCoverage),由于,JEST 使用 istanbul 生成覆盖率报告,即 lcov。那么,上传报告就异常简单了。只需安装 codecov

npm install codecov --save-dev

然后,在 CI 执行之后,上传报告就行。比如,像这样

language: node_js
node_js:
  - "6"
cache:
  directories: node_modules
script:
  - npm run test:coverage
  # 这里我没有全局安装 codecov,所以要通过 npm 来运行 codecov
  - npm run codecov
os:
  - linux
  - osx

这次的 badge 如何获取上面有写到,这里就不再展示了。

SAUCELABS vs BrowserStack

跨浏览器测试同样有 2 个选择,这次我同三剑客的作者站在了同一战线,选择使用 SAUCELABS

不过,由于 JEST 不支持 end-to-end 测试,所以,为了做跨浏览器测试我们不得不寻求其他的测试框架来帮助完成这一工作。这里我并不打算使用 karma,即使是 karma 同 SAUCELABS 有现成的集成插件 karma-sauce-launcher 可以使用。

不要问我为什么,就是这么任(jue)性(jiang)。

你真不问么?那我就说了吧。因为现有的测试框架 JEST 已经可以完成 karma 的大部分工作,单纯为 end-to-end 测试单独引入 karma 就没有必要了。

经过一番资料收集和比较之后,我选择 Nightwatch 来解决跨浏览器测试的问题。

What’s Nightwatch?

Nightwatch.js is an automated testing framework for web applications and websites, written in Node.js and using the W3C WebDriver API (formerly Selenium WebDriver).

It is a complete browser (End-to-End) testing solution which aims to simplify the process of setting up Continuous Integration and writing automated tests.

可以从官网的介绍中看到,Nightwatch 对我们当前想解决的问题简直是正中下怀啊!(如果你的项目使用的是 Angular,那么,你也可以试试 Protractor)

在查资料时,发现 nightwatch 的第一个 issue 竟然是尤大大提的。

走得越远,越是发现一路都是大大们留下的足迹。

膜拜大大。

回到正题,使用 nightwatch 建立 e2e 测试也是相当容易的,这里就简要说一下流程。

首先,使用 npm 进行安装,这就不多说了。
然后,在根目录下添加配置文件,可以是 nightwatch.conf.js,也可以是 nightwatch.json。
接着,写对应的测试,API 参考官网
最后,跑测试命令就好了。

主要是来看看,怎么将 nightwatch 的测试同 saucelabs 以及 travis-ci 整合到一起。先看看测试文件。

// nightwatch.conf.js
module.exports = {
    src_folders: ['tests/e2e'], // 测试文件目录
    output_folder: 'tests/reports', // 测试报告地址
    custom_commands_path: 'tests/saucelabs', // 自定义命令,这里用来更新测试信息到 saucelabs
    custom_assertions_path: '',
    page_objects_path: '',
    globals_path: '',

    test_workers: {
        enabled: true,
        workers: 'auto'
    },

    test_settings: {
        default: {
            launch_url: 'http://localhost:8080', // 目标地址,用于测试中读取
            selenium_port: 4445, // selenium server 的端口(selenium server 由 saucelabs 提供)
            selenium_host: 'localhost', // selenium server 的地址(selenium server 由 saucelabs 提供)
            username: process.env.SAUCE_USERNAME,
            access_key: process.env.SAUCE_ACCESS_KEY,
            silent: true,
            screenshots: {
                enabled: false,
                path: ''
            },
            globals: {
                waitForConditionTimeout: 15000
            },
            // 以下重要!!!
            desiredCapabilities: {
                build: `build-${process.env.TRAVIS_JOB_NUMBER}`,
                public: 'public',
                'tunnel-identifier': process.env.TRAVIS_JOB_NUMBER
            }
        },

        // 以下是不同环境的配置
        chrome: {
            desiredCapabilities: {
                browserName: 'chrome'
            }
        },

        firefox: {
            desiredCapabilities: {
                browserName: 'firefox'
            }
        },

        internet_explorer_10: {
            desiredCapabilities: {
                browserName: 'internet explorer',
                version: '10'
            }
        },

        internet_explorer_11: {
            desiredCapabilities: {
                browserName: 'internet explorer',
                version: '11'
            }
        },

        edge: {
            desiredCapabilities: {
                browserName: 'MicrosoftEdge'
            }
        }
    }
};

这里要注意以下几点:(重要!!!这些折磨了我近一周)

  • 运行 localhost 测试,要开启 sauce connect
  • 开启 sauce connect 之后,设置运行环境 selenium_port: 4445, selenium_host: 'localhost'

以上几点是本地测试时需注意的,下面是连通 travis 时需注意的:

  • 配置 'tunnel-identifier': process.env.TRAVIS_JOB_NUMBER,其中 process.env.TRAVIS_JOB_NUMBER 是 travis 运行时的全局变量
  • 配置 process.env.SAUCE_USERNAMEprocess.env.SAUCE_ACCESS_KEY,后面细讲
  • 配置 buildpublic 属性,分别用于标识测试和查看权限,这两点对最后生成 browser matrix badge 有用,这两点在三剑客的文章中也有提到

配置好了 nightwatch 同 saucelabs,再修改下 travis 的配置,将 saucelabs 整合进去。

// .travis.yml
language: node_js
node_js:
- '6'
cache:
  directories: node_modules
# 用于打包,并在 travis 上启动本地服务,用于 e2e test
before_script:
- npm run build
- node server.js &
script:
- npm run test:coverage
- npm run codecov
- npm run test:e2e
os:
- linux
- osx
env:
  global:
  - secure: v6CRj4CKMqxEQ9MSYKAkbmrBgIBZvoppICx6JyjQXhexPOVQKBvboCgdL0lOOZdGZ9rEqSMXvud97kBAFYd1sdP/kSwXdUct5BOMIT3a5GLtY5aQfOocBwR6IvmZpO2U+4VhrCwkzdaq2Ehq0fAXF1pkxDj9YkJZmwDNhTdfDGkib+AwDyr4TLQFC1QrD/4vmrULb3NZdW1KadFYjLzVF8FMa2tDSYMFFVymYu5nuCa/Z0dqSfFy8McYwBMzThDkDRHMT/sf4zKDPyxUwN7xGfC6T88xzCEaltN6K7MGMGKvl7Y0p7VjYW/+rO38936kj6xuPU6J7Vh2yKPJhhT2LtM7ucuo0XSpIxCxaKXWeEmYl2KkCMWNHgrWACE//WBFRNx/JQHimw+abr1Zt/3V9QmSEvnB3hHB0NQgJ2nVrVDjk51RSVaiP4sfQ8GVqEwr1+wJqe4wz7fV+jvRB9uUGgGsjsBbZi6ZycoMtOBoJ+miviRCjZvf9sOZKfIDjcuE5vETQcE37d/++yplCG0N83Kx+q67mbWXirfNj2CfXp7pwHTN+n21v1BSicXqQ6+jaNzD/pcN/GTHgZ5A+VkdcjSmEziuQTO035i1nnCB9TQdFeRdGdfo6DAiq8YOfyVkQ1lml6lWqbPqa4QWokRUD2yA/hAIzNWe5BeLF2JFQBc=
  - secure: S0vWVM74eiAHhk+kqqvym9aIgqaaGyGz9H3rfmEZoG4iuvXjXRaHOOSHxIRVsh5RYXr0PWHAj24fpN5AyUOlu5NQiwACBqmpw9KZBgVekWFshA5uYmpNpCG9w5/UAQa9q2+EcndOCM4lAyuT2wVJ5WfsHRzIA5jUpK1YmUYtuVICTSkumRoEaxfPkwzcGLF7f6aP7mG1YRKeO1F9+RhBfaGN1kYordxIk/fniH8OFB0XiLZ5OIovaAIYFKic0P1wUFwa78jU2fovdObS8JySl2LP19eaLX0MgAFoPB7oLFPxFBN7FCID41TEodDdZtcNnKJT4uQ/iWRqww2BOwVQM9whyBTg8J4kJZALicR4CzGCuUbdyQd2kh/hNZ9d9SKb6YXdcZElFmh3FY6zgfgv5PAx+jDlkfzmgBh7OD5OM4GVrsCsjnaAlmTUNtRPx9B4ps0gbr25F1PxuNy+MXfwSYJdliL+N01BTpiGyts/EXAraWvEm5YkhWfTnbgc8osd3cX9vwB0QHksK+BpkaEs6XCwU6kGMxAJIlafRv6RslREdTPBpYaXB4sGqdYXWY+YFqNxsAwTB3KWIq/uhZmSkou1jZfZa2QonMuVot68U11U7afmPzX8KOVeO2IEcUjt6I4eCYQ+31xO/wSLIQ1uoRySQ2S9VCzr+yzDpu0KVps=
addons:
  sauce_connect: true

你肯定会诧异 global 下面的那两串长🐜是什么东西。它们其实就是在 nightwatch.conf.js 中用到的 process.env.SAUCE_USERNAMEprocess.env.SAUCE_ACCESS_KEY

那它们是怎么来的哪?

首先,安装 travis 工具 gem install travis
然后,使用 github 账户登录 travis login
登录后,就可以分别使用 travis encrypt SAUCE_USERNAME=saucelabs用户名 --add
travis encrypt SAUCE_ACCESS_KEY=saucelabs的access_key --add
将 username 和 access_key 加密,--add 参数会自动将结果追加到 .travis.yml 文件中。所以,已完全不用担心字符贴错或贴漏。

这样整个跨浏览器测试就同 CI 集成好了,配置信息比较多,有兴趣的可以结合项目一起看。(点这里

最后,不要忘(tian)了(jia)初(hui)衷(zhang)。这可以在 saucelabs 的 Dashboard -> Automated Builds 下看到。

总的来说,nigthwatch + saucelabs + travis 来做跨浏览器自动测试还是比较方便的,只是一开始不熟悉,相应的资料也比较少,saucelabs 的文档也不够友好,耗费了些时间。覆盖率测试时, JEST 占的那点小便宜全都还回来了。

Automatically Publish

看到这里,你是不是以为 CI 只是帮你跑跑测试、显示覆盖率?那你就错了。

CI 并不是单单只能帮你跑测试,它还可以将构建成功的代码发布到服务器上。试想一下,当你将代码合并到主分支之后,CI 不但帮你运行测试,还将测试通过之后的代码发布到了你的服务器上,而不需要你人工进行额外的操作。这是不是很 cool!

这里就举一个通过 Travis-ci 将代码发布到 github.io 上的例子。

再修改一下上面 .travis.yml 文件。

language: node_js
node_js:
- '6'
cache:
  directories: node_modules
before_script:
- npm run build
- node server.js &
script:
- npm run test:coverage
- npm run codecov
- npm run test:e2e
after_success:
- bash ./deploy.sh
os:
- linux
- osx
env:
  global:
  - USER_NAME: Disciple_D
  - USER_EMAIL: disciple.ding@gmail.com
  - GIT_DEPLOY_KEY: XXXXXXXX
  - secure: v6CRj4CKMqxEQ9MSYKAkbmrBgIBZvoppICx6JyjQXhexPOVQKBvboCgdL0lOOZdGZ9rEqSMXvud97kBAFYd1sdP/kSwXdUct5BOMIT3a5GLtY5aQfOocBwR6IvmZpO2U+4VhrCwkzdaq2Ehq0fAXF1pkxDj9YkJZmwDNhTdfDGkib+AwDyr4TLQFC1QrD/4vmrULb3NZdW1KadFYjLzVF8FMa2tDSYMFFVymYu5nuCa/Z0dqSfFy8McYwBMzThDkDRHMT/sf4zKDPyxUwN7xGfC6T88xzCEaltN6K7MGMGKvl7Y0p7VjYW/+rO38936kj6xuPU6J7Vh2yKPJhhT2LtM7ucuo0XSpIxCxaKXWeEmYl2KkCMWNHgrWACE//WBFRNx/JQHimw+abr1Zt/3V9QmSEvnB3hHB0NQgJ2nVrVDjk51RSVaiP4sfQ8GVqEwr1+wJqe4wz7fV+jvRB9uUGgGsjsBbZi6ZycoMtOBoJ+miviRCjZvf9sOZKfIDjcuE5vETQcE37d/++yplCG0N83Kx+q67mbWXirfNj2CfXp7pwHTN+n21v1BSicXqQ6+jaNzD/pcN/GTHgZ5A+VkdcjSmEziuQTO035i1nnCB9TQdFeRdGdfo6DAiq8YOfyVkQ1lml6lWqbPqa4QWokRUD2yA/hAIzNWe5BeLF2JFQBc=
  - secure: S0vWVM74eiAHhk+kqqvym9aIgqaaGyGz9H3rfmEZoG4iuvXjXRaHOOSHxIRVsh5RYXr0PWHAj24fpN5AyUOlu5NQiwACBqmpw9KZBgVekWFshA5uYmpNpCG9w5/UAQa9q2+EcndOCM4lAyuT2wVJ5WfsHRzIA5jUpK1YmUYtuVICTSkumRoEaxfPkwzcGLF7f6aP7mG1YRKeO1F9+RhBfaGN1kYordxIk/fniH8OFB0XiLZ5OIovaAIYFKic0P1wUFwa78jU2fovdObS8JySl2LP19eaLX0MgAFoPB7oLFPxFBN7FCID41TEodDdZtcNnKJT4uQ/iWRqww2BOwVQM9whyBTg8J4kJZALicR4CzGCuUbdyQd2kh/hNZ9d9SKb6YXdcZElFmh3FY6zgfgv5PAx+jDlkfzmgBh7OD5OM4GVrsCsjnaAlmTUNtRPx9B4ps0gbr25F1PxuNy+MXfwSYJdliL+N01BTpiGyts/EXAraWvEm5YkhWfTnbgc8osd3cX9vwB0QHksK+BpkaEs6XCwU6kGMxAJIlafRv6RslREdTPBpYaXB4sGqdYXWY+YFqNxsAwTB3KWIq/uhZmSkou1jZfZa2QonMuVot68U11U7afmPzX8KOVeO2IEcUjt6I4eCYQ+31xO/wSLIQ1uoRySQ2S9VCzr+yzDpu0KVps=
addons:
  sauce_connect: true

可以看到,我又给它添加了一个 after_success 的配置,只有当之前的测试运行成功之后,才运行之后的命令。当然你也可以选用其他的配置,比如:deploy

要将代码发布到 github.io 上,就势必要 push 代码至仓库的 gh-pages 分支。然而,如果要通过 travis-ci 向 github 提交代码,那么,就要首先建立 ssh 链接。因为,这里是发布特定的仓库代码,所以,我推荐大家通过给 repository 设置 deploy key 的方式来给 travis-ci 授权,而不是 access token。

那么,如何设置 deploy key?

  1. 本地新建一个 ssh key(不清楚的点这里
  2. 进入 github 你要发布的仓库中,选择 settings -> Deploy keys -> Add deploy key,并将你刚刚生成的 key.pub 文件中的内容复制到输入框中,记得勾选 Allow write access,再点击 Add key。这样就设置好了 deploy key,但肯定不能将 key 直接放到 github 上,需要先加密。
  3. 使用 travis 工具加密 deploy key travis encrypt-file key,这会生成一个 key.enc 文件,将这个文件加入到代码仓库中就行,不要向代码库提交生成的 key 和 key.pub 文件
  4. 加密完成后,控制台会输出一串日志,其中有类似这样的一条 openssl aes-256-cbc -K $encrypted_c7881d9cb8b5_key -iv $encrypted_c7881d9cb8b5_iv -in key.enc -out key -d,这就是用来建立 ssh 链接的。将其中 $encrypted_..._key 之间的字符提取出来,作为系统运行变量,也就是之前 .travis.yml 中的 GIT_DEPLOY_KEY: XXXXXXXX,这样发布脚步中就能使用这个变量

OK。这样 deploy key 就准备好了,下面是发布脚本。

#!/bin/bash
set -e # Exit with nonzero exit code if anything fails

# Git variables
TARGET_PATH="build/"
TARGET_BRANCH="gh-pages"

# Travis encrypt variables
ENCRYPTED_KEY="encrypted_${GIT_DEPLOY_KEY}_key"
ENCRYPTED_IV="encrypted_${GIT_DEPLOY_KEY}_iv"

# Save some useful information
REPO=`git config remote.origin.url`
SSH_REPO=${REPO/https:\/\/github.com\//git@github.com:}
SHA=`git rev-parse --verify HEAD`

# Build source
npm run build

# Set committer git info
git config user.name $USER_NAME
git config user.email $USER_EMAIL

# Force add build folder to git
git add -f $TARGET_PATH

# Commit the build code, that is a local commit for git subtree split
git commit -m "Deploy to GitHub Pages: ${SHA}"

# Split build file as a $TARGET_BRANCH of git
git subtree split -P $TARGET_PATH -b $TARGET_BRANCH

# Add ssh authorization
openssl aes-256-cbc -K ${!ENCRYPTED_KEY} -iv ${!ENCRYPTED_IV} -in deploy_key.enc -out deploy_key -d

# Change the deploy_key mod to fix ssh permissions too open error
chmod 600 deploy_key
eval `ssh-agent -s`
ssh-add deploy_key

# Push code to git
git push -f $SSH_REPO $TARGET_BRANCH

这个脚本只需简单的变量改动就能适应你的项目,当然,你也可以为自己的项目编写自己的发布脚本。

Jenkins

以上说的都是源代码放在 Github 上的开源代码,但我相信大家接触得更多的应该是自己公司的私有代码,比如和 Jira 相关的 Stash。

首先,Stash 现已改名为之前提到过的 Bitbucket,那么,只要将 travis-ci 替换成 circleci 就可以了,其余两个插件都是支持 Bitbucket 的。

其次,如果项目仓库,既不是 Github, 也不是 Bitbucket 或 Gitlab,不要着急,这时候就需要祭出万金油 Jenkins 了。

Jenkins 那成千上万的 Plugin,相信总有一款适合你。比如,老版的 stash 就可以参照这篇文章来配置。

最后

最后,回顾一下整个 CI 流程。

当代码被提交到 github 分支上时,travis-ci 会被触发开始整套的测试及发布。

首先,安装项目依赖;
然后,运行测试,其中包括 UT 和 e2e test;
测试无误后,自动将打包后的代码发布到 gh-pages 分支;
于是,就可以通过 https://用户名.github.io/项目名 访问项目了。

完成~

来看看成(hui)果(zhang)吧。查看源码点这里

关于徽章

所有的徽章信息都可以在 shields.io 中查看,甚至可以自定义徽章,就像这样 。哈哈哈~

少年们,想要集徽章么?快把测试补起来吧~

参考文章:

  1. 前端开源项目持续集成三剑客
  2. 一个靠谱的前端开源项目需要什么?
  3. Zero to Hero with End-to-End tests using Nightwatch, SauceLabs and Travis
  4. Auto-deploying built products to gh-pages with Travis点击预览
  5. Continuous Integration with Stash and Jenkins

The Next Wave of Deep Learning Applications

$
0
0

The Next Wave of Deep Learning Applications

September 14, 2016

ab_s4534u58

Last week we described the next stage of deep learning hardware developments in some detail, focusing on a few specific architectures that capture what the rapidly-evolving field of machine learning algorithms require. This week we are focusing in on a trend that is moving faster than the devices can keep up with; the codes and application areas that are set to make this market spin in 2017.

It was with reserved skepticism that we listened, not even one year ago, to dramatic predictions about the future growth of the deep learning market—numbers that climbed into the billionsdespite the fact that most applications in the area were powering image tagging or recognition, translation, and other more consumer-oriented services. This was not to say that the potential of deep learning could not be seen springing from these early applications, but rather, the enterprise and scientific possibilities were just on the edge of the horizon.

In the meantime, significant hardware and algorithmic developments have been underway, propping up what appears to be an initial Cambrian explosion of new applications for deep learning frameworks in areas as diverse as energy, medicine, physics, and beyond.

What is most interesting is that in our careful following of peer-reviewed research over the last couple of years, it was only just this past month that a large number of deep learning applications in diverse domains have cropped up. These breathe new life into the market figures for deep learning that seemed staggering, at best—at worst, woefully optimistic.

These also help explain why companies like Intel are keen to make acquisition for both the hardware and software stacks from companies like Nervana Systems and Movidius, why Nvidiahas staked its future on deep learning acceleration, and why a wealth of chip startups with everything from custom ASICs, FPGAs, and other devices have rushed to meet a market that until very recently, just hasn’t been present in sufficient volume to warrant such hype. As a counterbalance to that statement, a significant uptick in research employing various deep learning frameworks does not create a market out of thin air either, but the point is that there is momentum in areas of high enterprise and scientific value—and it keeps building.

In the last two weeks alone we have seen research that breaks new ground in each of the following domains via neural networks and advanced machine learning frameworks. The listing below provides just a few select examples of the wave that hit the publication shores since the summer. Take note of the emphasis on medical applications for neural networks and machine learning. This appears to be where the most aggressive publishing is happening results-wise—and for an emerging market putting all the right tooling and coding in place, it starts to put some substance behind those billion dollar projections.

Advanced Melanoma Screening and Detection

Researchers at the University of Michigan are putting advanced image recognition to work, detecting one one of the most aggressive, but treatable in early stages, types of cancer. Melanoma can not only be deadly, but it can also be difficult to screen accurately. The team trained a neural network to isolate features (texture and structure) of moles and suspicious lesions for better recognition. The team says “the experimental results of qualitative and quantitative evaluations demonstrate that the method can outperform other state-of-the-art algorithms” for detecting melanoma known to date.

Neural Networks for Brain Cancer Detection

A team of French researchers note that spotting invasive brain cancer cells during surgery is difficult, in part because of the effects of lighting in operating rooms. They found that using neural networks in conjunction with Raman spectroscopy during operations allows them to detect the cancerous cells easier and reduce residual cancer post-operation. In fact, this piece is one of many over the last few weeks that matches advanced image recognition and classification with various types of cancer and screening apparatus–more in the short list below.

Machine Learning for Ultrasound Images, Pre-Natal Care

A collaborative team of researchers from the UK and Australia have applied image recognition and machine learning techniques to automatically interpret signs of fetal distress and to guide pre-operative strategies to mitigate potentially unhealthy conditions in the womb. Although limited by limited training sets for the neural networks, this research shows promise for further exploration, according to the authors.

Weather Forecasting and Event Detection

This traditional area for large-scale supercomputers is now becoming a hotbed for neural network development, particularly when it comes to weather event (pattern) detection. In one such use case, computational fluid dynamics codes are matched with neural networks and other genetic algorithm approaches to detect cyclone activity.

Energy Market Price Forecasting using Neural Networks

Researchers in Spain and Portugal have applied artificial neural networks to the energy grid in effort to predict price and usage fluctuations. The daily and intraday markets for the region are organized in a daily session where next-day sale and electricity purchase transactions are carried out and in six intraday sessions that consider energy offer and demand, which may arise in the hours following the daily viability schedule fixed after the daily session. In short, being able to make adequate predictions based on the patterns of consumption and availability yields to far higher efficiency and cost savings. More on how this model was put together and deployed here.

More on energy and wind generator prediction models can also be found in this paper, as well asthis one for the Canadian energy market, both published last week. Yet another, also published this week, does similar work in determining load and balancing for hybrid power facilities.

Neural Networks in Space Mission Efforts

An Italian team of researchers focused on CubeSats (a new category of space systems for missions in low Earth orbit) face several technical challenges on a number of different fronts. Their research focuses on the “attention on event detection capabilities, with the intent of enabling autonomous operations for a nanosatellite mission by presenting an artificial intelligence algorithm based on neural network technology, and applies it to a future mission used as a case study.” This is, as it sounds, a particularly complex, dense paper with a lot of unknowns, but worth a read to see how neural networks are being considered to solve optimization and other problems.

Neural Networks in Finance

Futures markets have seen a phenomenal success since their inception both in developed and developing countries during the last four decades. This success is attributable to the tremendous leverage the futures provide to market participants. This study analyzes a trading strategy which benefits from this leverage by using the Capital Asset Pricing Model (CAPM) and cost-of-carry relationship. The team applies the technical trading rules developed from spot market prices, on futures market prices using a CAPM based hedge ratio. Historical daily prices of twenty stocks from each of the ten markets (five developed markets and five emerging markets) are used for the analysis. Popular technical indicators, along with artificial intelligence techniques like neural networks and genetic algorithms, are used to generate buy and sell signals for each stock and for portfolios of stocks.

Trading and risk management are two areas where we would expect to see developments for neural networks. Also of note this last week was the application of neural networks to predict corporate bankruptcies (matched against other predictive approaches). Another interesting piecethis week looks at using neural networks to determine much larger-scale banking and financial health.

Neural Networks in Civil and Mechanical Engineering

This study from a team in Indonesia utilizes artificial neural networks to predict structural response (story drift) of multi-story reinforced concrete building under earthquake load in the region of Sumatera Island. Modal response spectrum analysis is performed to simulate earthquake loading and produce structural response data for further use in the ANN. The ANN architecture comprises of 3 layers: an input layer, a hidden layer, and an output layer. Earthquake load parameters from 11 locations in Sumatra Island, soil condition, and building geometry are selected as input parameters, whereas story drift is selected as output parameter for the ANN. As many as 1080 data sets were used to train the ANN and 405 data sets for testing. The trained ANN is capable of predicting story drift under earthquake loading at 95% rate of prediction and the calculated Mean-Squared Errors (MSE)

Also for civil engineers and city planning purposes, neural networks are being deployed to helppredict traffic speed conditions in various settings and another that looks at classification patterns for traffic accidents.

There are quite a few more examples to highlight, but these were some we handpicked to showcase the diversity of applications. Below is a yet another pared-down list of other application areas for neural networks all from the last couple of weeks of published research.

Biological/Earth Sciences

Estimation of Chlorophyll Concentration Index at Leaves using Artificial Neural Networks

Background Categorization for Automatic Animal Detection in Aerial Videos Using Neural Networks

Solar radiation prediction using fuzzy logic and neural networks

Automated Detection of Deep-Sea Animals

Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks

Prediction of Water-Level in the Urmia Lake Using the Extreme Learning Machine Approach

Electronics, Sensors, Equipment

Combined Geometric and Neural Network Approach to Generic Fault Diagnosis in Satellite Actuators and Sensors

Objectness Scoring and Detection Proposals in Forward-Looking Sonar Images with Convolutional Neural Networks

Materials, Manufacturing, and Industry

Intelligent integrated optimization of mining and pre-dressing grades in metal mines

Experimental Investigation And Analysis Of Wear Behaviour Of Aluminium Metal Matrix Composites Reinforced With Sic And Graphite

Prediction of Dust Dispersion by Drilling Operation Using Artificial Neural Networks

Robust Adaptive Voltage Control of Electric Generators for Ships

Sociology, Psychology, and the Humanities

The application of artificial neural networks in predicting children’s giftedness

Short Story Popularity Prediction using Neural Networks with Time Series-Based Circular Dependencies

Additional Medical Advancements using Neural Networks

Using Radial Basis Function Neural Networks for Continuous and Discrete Pain Estimation from Bio-physiological Signals

Quantifying Radiographic Knee Osteoarthritis Severity using Deep ConvolutionalNeural Networks

Breast cancer mammography diagnosis approach using dual tree complex wavelet transform and artificial neural networks

Mammographic Mass Classification Using Functional Link Neural Network with Modified Bee Firefly Algorithm

Again, remember that this is not a comprehensive list, but that it is notable in that there have been so very many new additions to the base of literature from many disciplines added in just the last few weeks. Just one year ago, we pulled the hype hat over our eyes to some extent–after all, this was most useful in tagging images on social sites and getting machines to paint pictures. The potential for higher purposes was there (the supercomputing world is seeing it too) but just beyond reach.

We are, it is safe to say, at the real beginning of mainstream applications for deep learning.


Deep learning for complete beginners

$
0
0

Deep learning for complete beginners: Recognising handwritten digitsby Cambridge Coding Academy | Download notebook

Introduction

Welcome to the first in a series of blog posts that is designed to get you quickly up to speed with deep learning; from first principles, all the way to discussions of some of the intricate details, with the purposes of achieving respectable performance on two established machine learning benchmarks: MNIST (classification of handwritten digits) and CIFAR-10 (classification of small images across 10 distinct classes—airplane, automobile, bird, cat, deer, dog, frog, horse, ship & truck).

MNIST CIFAR-10

The accelerated growth of deep learning has lead to the development of several very convenient frameworks, which allow us to rapidly construct and prototype our models, as well as offering a no-hassle access to established benchmarks such as the aforementioned two. The particular environment we will be using is Keras, which I’ve found to be the most convenient and intuitive for essential use, but still expressive enough to allow detailed model tinkering when it is necessary.

By the end of this part of the tutoral, you should be capable of understanding and producing a simple multilayer perceptron (MLP) deep learning model in Keras, achieving a respectable level of accuracy on MNIST. The next tutorial in the series will explore techniques for handling larger image classification tasks (such as CIFAR-10).

(Artificial) neurons

While the term “deep learning” allows for a broader interpretation, in pratice, for a vast majority of cases, it is applied to the model of (artificial) neural networks. These biologically inspired structures attempt to mimic the way in which the neurons in the brain process percepts from the environment and drive decision-making. In fact, a single artificial neuron (sometimes also called a perceptron) has a very simple mode of operation—it computes a weighted sum of all of itsinputs xx→, using a weight vector ww→ (along with an additive bias term, w0w0), and then potentially applies an activation function, σσ, to the result.

Some of the popular choices for activation functions include (plots given below):
identity: σ(z)=zσ(z)=z;
sigmoid: especially the logistic function, σ(z)=11+exp(z)σ(z)=11+exp⁡(−z), and the hyperbolic tangent, σ(z)=tanhzσ(z)=tanh⁡z;
rectified linear (ReLU): σ(z)=max(0,z)σ(z)=max(0,z).

Original perceptron models (from the 1950s) were fully linear, i.e. they only employed the identity activation. It soon became evident that tasks of interests are often nonlinear in nature, which lead to usage of other activation functions. Sigmoid functions (owing their name to their characteristic “S” shaped plot) provide a nice way to encode initial “uncertainty” of a neuron in a binary decision, when zz is close to zero, coupled with quick saturation as zz shifts in either direction. The two functions presented here are very similar, with the hyperbolic tangent giving outputs within [1,1][−1,1], and the logistic function giving outputs within [0,1][0,1] (and therefore being useful for representing probabilities).

In recent years, ReLU activations (and variations thereof) have become ubiquitous in deep learning—they started out as a simple, “engineer’s” way to inject nonlinearity into the model (“if it’s negative, set it to zero”), but turned out to be far more successful than the historically more popular sigmoid activations, and also have been linked to the way physical neurons transmit electrical potential. As such, we will focus exclusively on them in this tutorial.

A neuron is completely specified by its weight vector ww→, and the key aim of a learning algorithm is to assign appropriate weights to the neuron based on a training set of known input/output pairs, such that the notion of a “predictive error/loss” will be minimised when applying the inputs within the training set to the neuron. One typical example of such a learning algorithm is gradient descent, which will, for a particular loss function E(w)E(w→), update the weight vector in the direction of steepest descent of the loss function, scaled by a learning rate parameter ηη:

wwηE(w)ww→←w→−η∂E(w→)∂w→

The loss function represents our belief of how “incorrect” the neuron is at making predictions under its current parameter values. The simplest such choice of a loss function (that usually works best for general environments) is thesquared error loss; for a particular training example (x,y)(x→,y) it is defined as the squared difference between the ground-truth label yy and the output of the neuron when given xx→ as input:

E(w)=(yσ(w0+ni=1wixi))2E(w→)=(y−σ(w0+∑i=1nwixi))2

There are many excellent tutorials online that provide a more in-depth overview of gradient descent algorithms—one of which may be found on this website! Here the framework will take care of the optimisation routines for us, and therefore I will not dedicate further attention to them.

Enter neural networks (& deep learning)

Once we have a notion of a neuron, it is possible to connect outputs of neurons to inputs of other neurons, giving rise to neural networks. In general we will focus on feedforward neural networks, where these neurons are typically organised in layers, such that the neurons within a single layer process the outputs of the previous layer. The most potent of such architectures (a multilayer perceptron or MLP) fully connects all outputs of a layer to all the neurons in the following layer, as illustrated below.

The output neurons’ weights can be updated by direct application of the previously mentioned gradient descent on a given loss function—for other neurons these losses need to be propagated backwards (by applying the chain rule for partial differentiation), thus giving rise to the backpropagation algorithm. Similarly as for the basic gradient descent algorithm, I will not focus on the mathematical derivations of the algorithm here, as the framework will be taking care of it for us.

By Cybenko’s universal approximation theorem, a (wide enough) MLP with a single hidden layer of sigmoid neurons is capable of approximating any continuous real function on a bounded interval; however, the proof of this theorem is not constructive, and therefore does not offer an efficient training algorithm for learning such structures in general. Deep learning represents a response to this: rather than increasing the width, increase the depth; by definition, anyneural network with more than one hidden layer is considered deep.

The shift in depth also often allows us to directly feed raw input data into the network; in the past, single-layer neural networks were ran on features extracted from the input by carefully crafted feature functions. This meant that significantly different approaches were needed for, e.g. the problems of computer vision, speech recognition and natural language processing, impeding scientific collaboration across these fields. However, when a network has multiple hidden layers, it gains the capability to learn the feature functions that best describe the raw data by itself, thus being applicable to end-to-end learning and allowing one to use the same kind of networks across a wide variety of tasks, eliminating the need for designing feature functions from the pipeline. I will demonstrate graphical evidence of this in the second part of this tutorial, when we will explore convolutional neural networks (CNNs).

Applying a deep MLP to MNIST

As this post’s objective, we will implement the simplest possible deep neural network—an MLP with two hidden layers—and apply it on the MNIST handwritten digit recognition task.

Only the following imports are required:

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense # the two types of neural network layer we will be using
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values

Next up, we’ll define some parameters of our model. These are often called hyperparameters, because they are assumed to be fixed before training starts. For the purposes of this tutorial, we will stick to using some sensible values, but keep in mind that properly training them is a significant issue, which will be addressed more properly in a future tutorial.

In particular, we will define:
– The batch size, representing the number of training examples being used simultaneously during a single iteration of the gradient descent algorithm;
– The number of epochs, representing the number of times the training algorithm will iterate over the entire training set before terminating;
– The number of neurons in each of the two hidden layers of the MLP.

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 20 # we iterate twenty times over the entire training set
hidden_size = 512 # there will be 512 neurons in both hidden layers

Now it is time to load and preprocess the MNIST data set. Keras makes this extremely simple, with a fixed interface for fetching and extracting the data from the remote server, directly into NumPy arrays.

To preprocess the input data, we will first flatten the images into 1D (as we will consider each pixel as a separate input feature), and we will then force the pixel intensity values to be in the [0,1][0,1] range by dividing them by 255255. This is a very simple way to “normalise” the data, and I will be discussing other ways in future tutorials in this series.

A good approach to a classification problem is to use probabilistic classification, i.e. to have a single output neuron for each class, outputting a value which corresponds to the probability of the input being of that particular class. This implies a need to transform the training output data into a “one-hot” encoding: for example, if the desired output class is 33, and there are five classes overall (labelled 00 to 44), then an appropriate one-hot encoding is: [0 0 0 1 0][0 0 0 1 0]. Keras, once again, provides us with an out-of-the-box functionality for doing just that.

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(num_train, height * width) # Flatten data to 1D
X_test = X_test.reshape(num_test, height * width) # Flatten data to 1D
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

Now is the time to actually define our model! To do this we will be using a stack of three Dense layers, which correspond to a fully unrestricted MLP structure, linking all of the outputs of the previous layer to the inputs of the next one. We will use ReLU activations for the neurons in the first two layers, and a softmax activation for the neurons in the final one. This activation is designed to turn any real-valued vector into a vector of probabilities, and is defined as follows, for the jj-th neuron:

σ(z)j=exp(zj)iexp(zi)σ(z→)j=exp⁡(zj)∑iexp⁡(zi)

An excellent feature of Keras, that sets it apart from frameworks such as TensorFlow, is automatic inference of shapes; we only need to specify the shape of the input layer, and afterwards Keras will take care of initialising the weight variables with proper shapes. Once all the layers have been defined, we simply need to identify the input(s) and the output(s) in order to define our model, as illustrated below.

inp = Input(shape=(height * width,)) # Our input is a 1D vector of size 784
hidden_1 = Dense(hidden_size, activation='relu')(inp) # First hidden ReLU layer
hidden_2 = Dense(hidden_size, activation='relu')(hidden_1) # Second hidden ReLU layer
out = Dense(num_classes, activation='softmax')(hidden_2) # Output softmax layer

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

To finish off specifying the model, we need to define our loss function, the optimisation algorithm to use, and which metrics to report.

When dealing with probabilistic classification, it is actually better to use the cross-entropy loss, rather than the previously defined squared error. For a particular output probability vector yy→, compared with our (ground truth) one-hot vector ^yy^→, the loss (for kk-class classification) is defined by

L(y,^y)=ki=1^yilnyiL(y→,y^→)=−∑i=1ky^iln⁡yi

This loss is better for probabilistic tasks (i.e. ones with logistic/softmax output neurons), primarily because of its manner of derivation—it aims only to maximise the model’s confidence in the correct class, and is not concerned with the distribution of probabilities for other classes (while the squared error loss would dedicate equal attention to getting all of the other class probabilities as close to zero as possible). This is due to the fact that incorrect classes, i.e. classes ii′with ^yi=0y^i′=0, eliminate the respective neuron’s output from the loss function.

The optimisation algorithm used will typically revolve around some form of gradient descent; their key differences revolve around the manner in which the previously mentioned learning rate, ηη, is chosen or adapted during training. An excellent overview of such approaches is given by this blog post; here we will use the Adam optimiser, which typically performs well.

As our classes are balanced (there is an equal amount of handwritten digits across all ten classes), an appropriate metric to report is the accuracy; the proportion of the inputs classified correctly.

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

Finally, we call the training algorithm with the determined batch size and epoch count. It is good practice to set aside a fraction of the training data to be used just for verification that our algorithm is (still) properly generalising (this is commonly referred to as the validation set); here we will hold out 10%10% of the data for this purpose.

An excellent out-of-the-box feature of Keras is verbosity; it’s able to provide detailed real-time pretty-printing of the training algorithm’s progress.

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Train on 54000 samples, validate on 6000 samples
Epoch 1/20
54000/54000 [==============================] - 9s - loss: 0.2295 - acc: 0.9325 - val_loss: 0.1093 - val_acc: 0.9680
Epoch 2/20
54000/54000 [==============================] - 9s - loss: 0.0819 - acc: 0.9746 - val_loss: 0.0922 - val_acc: 0.9708
Epoch 3/20
54000/54000 [==============================] - 11s - loss: 0.0523 - acc: 0.9835 - val_loss: 0.0788 - val_acc: 0.9772
Epoch 4/20
54000/54000 [==============================] - 12s - loss: 0.0371 - acc: 0.9885 - val_loss: 0.0680 - val_acc: 0.9808
Epoch 5/20
54000/54000 [==============================] - 12s - loss: 0.0274 - acc: 0.9909 - val_loss: 0.0772 - val_acc: 0.9787
Epoch 6/20
54000/54000 [==============================] - 12s - loss: 0.0218 - acc: 0.9931 - val_loss: 0.0718 - val_acc: 0.9808
Epoch 7/20
54000/54000 [==============================] - 12s - loss: 0.0204 - acc: 0.9933 - val_loss: 0.0891 - val_acc: 0.9778
Epoch 8/20
54000/54000 [==============================] - 13s - loss: 0.0189 - acc: 0.9936 - val_loss: 0.0829 - val_acc: 0.9795
Epoch 9/20
54000/54000 [==============================] - 14s - loss: 0.0137 - acc: 0.9950 - val_loss: 0.0835 - val_acc: 0.9797
Epoch 10/20
54000/54000 [==============================] - 13s - loss: 0.0108 - acc: 0.9969 - val_loss: 0.0836 - val_acc: 0.9820
Epoch 11/20
54000/54000 [==============================] - 13s - loss: 0.0123 - acc: 0.9960 - val_loss: 0.0866 - val_acc: 0.9798
Epoch 12/20
54000/54000 [==============================] - 13s - loss: 0.0162 - acc: 0.9951 - val_loss: 0.0780 - val_acc: 0.9838
Epoch 13/20
54000/54000 [==============================] - 12s - loss: 0.0093 - acc: 0.9968 - val_loss: 0.1019 - val_acc: 0.9813
Epoch 14/20
54000/54000 [==============================] - 12s - loss: 0.0075 - acc: 0.9976 - val_loss: 0.0923 - val_acc: 0.9818
Epoch 15/20
54000/54000 [==============================] - 12s - loss: 0.0118 - acc: 0.9965 - val_loss: 0.1176 - val_acc: 0.9772
Epoch 16/20
54000/54000 [==============================] - 12s - loss: 0.0119 - acc: 0.9961 - val_loss: 0.0838 - val_acc: 0.9803
Epoch 17/20
54000/54000 [==============================] - 12s - loss: 0.0073 - acc: 0.9976 - val_loss: 0.0808 - val_acc: 0.9837
Epoch 18/20
54000/54000 [==============================] - 13s - loss: 0.0082 - acc: 0.9974 - val_loss: 0.0926 - val_acc: 0.9822
Epoch 19/20
54000/54000 [==============================] - 12s - loss: 0.0070 - acc: 0.9979 - val_loss: 0.0808 - val_acc: 0.9835
Epoch 20/20
54000/54000 [==============================] - 11s - loss: 0.0039 - acc: 0.9987 - val_loss: 0.1010 - val_acc: 0.9822
10000/10000 [==============================] - 1s





[0.099321320021623111, 0.9819]

As can be seen, our model achieves an accuracy of 98.2%∼98.2% on the test set; this is quite respectable for such a simple model, despite being outclassed by state-of-the-art approaches enumerated here.

I encourage you to play around with this model: attempt different hyperparameter values/optimisation algorithms/activation functions, add more hidden layers, etc. Eventually, you should be able to achieve accuracies above 99%99%.

Conclusion

Throughout this post we have covered the essentials of deep learning, and successfully implemented a simple two-layer deep MLP in Keras, applying it to MNIST, all in under 30 lines of code.

Next time around, we will explore convolutional neural networks (CNNs), resolving some of the issues posed by applying MLPs to larger image tasks (such as CIFAR-10).

ABOUT THE AUTHOR

Petar Veličković

Petar is currently a Research Assistant in Computational Biology within the Artificial Intelligence Group of the Cambridge University Computer Laboratory, where he is working on developing machine learning algorithms on complex networks, and their applications to bioinformatics. He is also a PhD student within the group, supervised by Dr Pietro Liò and affiliated with Trinity College. He holds a BA degree in Computer Science from the University of Cambridge, having completed the Computer Science Tripos in 2015.

Just show me the code!

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense # the two types of neural network layer we will be using
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 20 # we iterate twenty times over the entire training set
hidden_size = 512 # there will be 512 neurons in both hidden layers

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(num_train, height * width) # Flatten data to 1D
X_test = X_test.reshape(num_test, height * width) # Flatten data to 1D
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

inp = Input(shape=(height * width,)) # Our input is a 1D vector of size 784
hidden_1 = Dense(hidden_size, activation='relu')(inp) # First hidden ReLU layer
hidden_2 = Dense(hidden_size, activation='relu')(hidden_1) # Second hidden ReLU layer
out = Dense(num_classes, activation='softmax')(hidden_2) # Output softmax layer

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!

 

Deep learning for complete beginners: Using convolutional nets to recognise imagesby Cambridge Coding Academy | Download notebook

Introduction

Welcome to the second in a series of blog posts that is designed to get you quickly up to speed with deep learning; from first principles, all the way to discussions of some of the intricate details, with the purposes of achieving respectable performance on two established machine learning benchmarks: MNIST (classification of handwritten digits) and CIFAR-10 (classification of small images across 10 distinct classes—airplane, automobile, bird, cat, deer, dog, frog, horse, ship & truck).

MNIST CIFAR-10

Last time around, I have introduced the fundamental concepts of deep learning, and illustrated how models can be rapidly developed and prototyped by leveraging the Keras deep learning framework. Ultimately, a two-layer multilayer perceptron (MLP) was applied to MNIST, achieving an accuracy level of 98.2%98.2%, which can be quite easily improved upon. But ultimately, fully connected MLPs will usually not be the model of choice for image-related tasks—it is far more typical to make advantage of a convolutional neural network (CNN) in this case. By the end of this part of the tutoral, you should be capable of understanding and producing a simple CNN in Keras, achieving a respectable level of accuracy on CIFAR-10.

This tutorial will, for the most part, assume familiarity with the previous one in the series.

Image processing

The previously mentioned multilayer perceptrons represent the most general and powerful feedforward neural network model possible; they are organised in layers, such that every neuron within a layer receives its own copy of all the outputs of the previous layer as its input. This kind of model is perfect for the right kind of problem—learning from a fixed number of (more or less) unstructured parameters.

However, consider what happens to the number of parameters (weights) of such a model when being fed raw image data. CIFAR-10, for example, contains 32×32×332×32×3 coloured images: if we are to treat each channel of each pixel as an independent input to an MLP, each neuron of the first hidden layer adds 3000∼3000 new parameters to the model! The situation quickly becomes unmanageable as image sizes grow larger, way before reaching the kind of images people usually want to work with in real applications.

A common solution is to downsample the images to a size where MLPs can safely be applied. However, if we directly downsample the image, we potentially lose a wealth of information; it would be great if we would somehow be able to still do some useful (without causing an explosion in parameter count) processing of the image, prior to performing the downsampling.

Convolutions

It turns out that there is a very efficient way of pulling this off, and it makes advantage of the structure of the information encoded within an image—it is assumed that pixels that are spatially closer together will “cooperate” on forming a particular feature of interest much more than ones on opposite corners of the image. Also, if a particular (smaller) feature is found to be of great importance when defining an image’s label, it will be equally important if this feature was found anywhere within the image, regardless of location.

Enter the convolution operator. Given a two-dimensional image, II, and a small matrix, KK of size h×wh×w, (known as a convolution kernel), which we assume encodes a way of extracting an interesting image feature, we compute the convolved image, IKI∗K, by overlaying the kernel on top of the image in all possible ways, and recording the sum of elementwise products between the image and the kernel:

(IK)xy=hi=1wj=1KijIx+i1,y+j1(I∗K)xy=∑i=1h∑j=1wKij⋅Ix+i−1,y+j−1

(in fact, the exact definition would require us to flip the kernel matrix first, but for the purposes of machine learning it is irrelevant whether this is done)

The images below show a diagrammatical overview of the above formula and the result of applying convolution (with two separate kernels) over an image, to act as an edge detector:


Convolutional and pooling layers

The convolution operator forms the fundamental basis of the convolutional layer of a CNN. The layer is completely specified by a certain number of kernels, KK→ (along with additive biases, bb→, per each kernel), and it operates by computing the convolution of the output images of a previous layer with each of those kernels, afterwards adding the biases (one per each output image). Finally, an activation function, σσ, may be applied to all of the pixels of the output images. Typically, the input to a convolutional layer will have dd channels (e.g. red/green/blue in the input layer), in which case the kernels are extended to have this number of channels as well, making the final formula of a single output image channel of a convolutional layer (for a kernel KK and bias bb) as follows:

conv(I,K)xy=σ(b+hi=1wj=1dk=1KijkIx+i1,y+j1,k)conv(I,K)xy=σ(b+∑i=1h∑j=1w∑k=1dKijk⋅Ix+i−1,y+j−1,k)

Note that, since all we’re doing here is addition and scaling of the input pixels, the kernels may be learned from a given training dataset via gradient descent, exactly as the weights of an MLP. In fact, an MLP is perfectly capable of replicating a convolutional layer, but it would require a lot more training time (and data) to learn to approximate that mode of operation.

Finally, let’s just note that a convolutional operator is in no way restricted to two-dimensionally structured data: in fact, most machine learning frameworks (Keras included) will provide you with out-of-the-box layers for 1D and 3D convolutions as well!

It is important to note that, while a convolutional layer significantly decreases the number of parameters compared to a fully connected (FC) layer, it introduces more hyperparameters—parameters whose values need to be chosenbefore training starts.

Namely, the hyperparameters to choose within a single convolutional layer are:
depth: how many different kernels (and biases) will be convolved with the output of the previous layer;
height and width of each kernel;
stride: by how much we shift the kernel in each step to compute the next pixel in the result. This specifies the overlap between individual output pixels, and typically it is set to 11, corresponding to the formula given before. Note that larger strides result in smaller output sizes.
padding: note that convolution by any kernel larger than 1×11×1 will decrease the output image size—it is often desirable to keep sizes the same, in which case the image is sufficiently padded with zeroes at the edges. This is often called“same” padding, as opposed to “valid” (no) padding. It is possible to add arbitrary levels of padding, but typically the padding of choice will be either same or valid.

As already hinted, convolutions are not typically meant to be the sole operation in a CNN (although there have been promising recent developments on all-convolutional networks); but rather to extract useful features of an image prior to downsampling it sufficiently to be manageable by an MLP.

A very popular approach to downsampling is a pooling layer, which consumes small and (usually) disjoint chunks of the image (typically 2×22×2) and aggregates them into a single value. There are several possible schemes for the aggregation—the most popular being max-pooling, where the maximum pixel value within each chunk is taken. A diagrammatical illustration of 2×22×2 max-pooling is given below.

Putting it all together: a common CNN

Now that we got all the building blocks, let’s see what a typical convolutional neural network might look like!

A typical CNN architecture for a kk-class image classification can be split into two distinct parts—a chain of repeating ConvPoolConv→Pool layers (sometimes with more than one convolutional layer at once), followed by a few fully connected layers (taking each pixel of the computed images as an independent input), culminating in a kk-way softmax layer, to which a cross-entropy loss is optimised. I did not draw the activation functions here to make the sketch clearer, but do keep in mind that typically after every convolutional or fully connected layer, an activation (e.g. ReLU) will be applied to all of the outputs.

Note the effect of a single ConvPoolConv→Pool pass through the image: it reduces height and width of the individual channels in favour of their number, i.e. depth.

The softmax layer and cross-entropy loss are both introduced in more detail in the previous tutorial. For summarisation purposes, a softmax layer’s purpose is converting any vector of real numbers into a vector of probabilities(nonnegative real values that add up to 1). Within this context, the probabilities correspond to the likelihoods that an input image is a member of a particular class. Minimising the cross-entropy loss has the effect of maximising the model’s confidence in the correct class, without being concerned for the probabilites for other classes—this makes it a more suitable choice for probabilistic tasks compared to, for example, the squared error loss.

Detour: Overfitting, regularisation and dropout

This will be the first (and hopefully the only) time when I will divert your attention to a seemingly unrelated topic. It regards a very important pitfall of machine learning—overfitting a model to the training data. While this is primarily going to be a major topic of the next tutorial in the series, the negative effects of overfitting will tend to become quite noticeable on the networks like the one we are about to build, and we need to introduce a way to properly protect ourselves against it, before going any further. Luckily, there is a very simple technique we can use.

Overfitting corresponds to adapting our model to the training set to such extremes that its generalisation potential (performance on samples outside of the training set) is severely limited. In other words, our model might have learned the training set (along with any noise present within it) perfectly, but it has failed to capture the underlying process that generated it. To illustrate, consider a problem of fitting a sine curve, with white additive noise applied to the data points:

Here we have a training set (denoted by blue circles) derived from the original sine wave, along with some noise. If we fit a degree-3 polynomial to this data, we get a fairly good approximation to the original curve. Someone might argue that a degree-14 polynomial would do better; indeed, given we have 15 points, such a fit would perfectly describe the training data. However, in this case, the additional parameters of the model cause catastrophic results: to cope with the inherent noise of the data, anywhere except in the closest vicinity of the training points, our fit is completely off.

Deep convolutional neural networks have a large number of parameters, especially in the fully connected layers. Overfitting might often manifest in the following form: if we don’t have sufficiently many training examples, a small group of neurons might become responsible for doing most of the processing and other neurons becoming redundant; or in the other extreme, some neurons might actually become detrimental to performance, with several other neurons of their layer ending up doing nothing else but correcting for their errors.

To help our models generalise better in these circumstances, we introduce techniques of regularisation: rather than reducing the number of parameters, we impose constraints on the model parameters during training to keep them from learning the noise in the training data. The particular method I will introduce here is dropout—a technique that initially might seem like “dark magic”, but actually helps to eliminate exactly the failure modes described above. Namely, dropout with parameter pp will, within a single training iteration, go through all neurons in a particular layer and, with probability pp, completely eliminate them from the network throughout the iteration. This has the effect of forcing the neural network to cope with failures, and not to rely on existence of a particular neuron (or set of neurons)—relying more on a consensus of several neurons within a layer. This is a very simple technique that works quite well already for combatting overfitting on its own, without introducing further regularisers. An illustration is given below.

Applying a deep CNN to CIFAR-10

As this post’s objective, we will implement a deep convolutional neural network—and apply it on the CIFAR-10 image classification task.

Imports are largely similar to last time, apart from the fact that we will be using a wider variety of layers:

from keras.datasets import cifar10 # subroutines for fetching the CIFAR-10 dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Convolution2D, MaxPooling2D, Dense, Dropout, Flatten
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values
import numpy as np
Using Theano backend.

As already mentioned, a CNN will typically have more hyperparameters than an MLP. For the purposes of this tutorial, we will also stick to “sensible” hand-picked values for them, but do still keep in mind that later on I will introduce a more proper method for learning them.

The hyperparameters are:
– The batch size, representing the number of training examples being used simultaneously during a single iteration of the gradient descent algorithm;
– The number of epochs, representing the number of times the training algorithm will iterate over the entire training set before terminating*;
– The kernel sizes in the convolutional layers;
– The pooling size in the pooling layers;
– The number of kernels in the convolutional layers;
– The dropout probability (we will apply dropout after each pooling, and after the fully connected layer);
– The number of neurons in the fully connected layer of the MLP.

* N.B. here I have set the number of epochs to 200, which might be undesirably slow if you do not have a GPU at your disposal (the convolution layers are going to pose a significant performance bottleneck in this case). You might wish to decrease the epoch count and/or numbers of kernels if you are going to be training the network on a CPU.

batch_size = 32 # in each iteration, we consider 32 training examples at once
num_epochs = 200 # we iterate 200 times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth_1 = 32 # we will initially have 32 kernels per conv. layer...
conv_depth_2 = 64 # ...switching to 64 after the first pooling layer
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 512 # the FC layer will have 512 neurons

Loading and preprocessing the CIFAR-10 dataset is done in exactly the same way as for MNIST, with Keras routines doing most of the work. The sole difference is that now we do not initially consider each pixel an independent input feature, and therefore we do not reshape the input to 1D. We will once again force the pixel intensity values to be in the [0,1][0,1], and use a one-hot encoding for the output labels.

However, this time around, this stage will be done in a more general way, to allow you to adapt it more easily to new datasets: the sizes will be extracted from the dataset rather than hardcoded, the number of classes is inferred from the number of unique labels in the training set, and the normalisation is performed via division by the maximum value in the training set.

N.B. we will divide the testing set by the maximum of the training set, because our algorithms are not allowed to see the testing data before the learning process is complete, and therefore we are not allowed to compute any statistics on it, other than performing transformations derived entirely from the training set.

(X_train, y_train), (X_test, y_test) = cifar10.load_data() # fetch CIFAR-10 data

num_train, depth, height, width = X_train.shape # there are 50000 training examples in CIFAR-10 
num_test = X_test.shape[0] # there are 10000 test examples in CIFAR-10
num_classes = np.unique(y_train).shape[0] # there are 10 image classes

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= np.max(X_train) # Normalise data to [0, 1] range
X_test /= np.max(X_train) # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

Modelling time! Our network will consist of four Convolution2D layers, with a MaxPooling2D layer following after the second and the fourth convolution. After the first pooling layer, we double the number of kernels (in line with the previously mentioned principle of sacrificing height and width for more depth). Afterwards, the output of the second pooling layer is flattened to 1D (via the Flatten layer), and passed through two fully connected (Dense) layers. ReLU activations will once again be used for all layers except the output dense layer, which will use a softmax activation (for purposes of probabilistic classification).

To regularise our model, a Dropout layer is applied after each pooling layer, and after the first Dense layer. This is another area where Keras shines compared to other frameworks: it has an internal flag that automatically enables or disables dropout, depending on whether the model is currently used for training or testing.

The remainder of the model specification exactly matches our previous setup for MNIST:
– We use the cross-entropy loss function as the objective to optimise (as its derivation is more appropriate for probabilistic tasks);
– We use the Adam optimiser for gradient descent;
– We report the accuracy of the model (as the dataset is balanced across the ten classes)*;
– We hold out 10% of the data for validation purposes.

* To get a feeling for why accuracy might be inappropriate for unbalanced datasets, consider an extreme case where 90% of the test data belongs to class xx (this could be, for example, the task of diagnosing patients for an extremely rare disease). In this case, a classifier that just outputs xx achieves a seemingly impressive accuracy of 90% on the test data, without really doing any learning/generalisation.

inp = Input(shape=(depth, height, width)) # N.B. depth goes first in Keras!
# Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer)
conv_1 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(inp)
conv_2 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_1)
pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
drop_1 = Dropout(drop_prob_1)(pool_1)
# Conv [64] -> Conv [64] -> Pool (with dropout on the pooling layer)
conv_3 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(drop_1)
conv_4 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_3)
pool_2 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_4)
drop_2 = Dropout(drop_prob_1)(pool_2)
# Now flatten to 1D, apply FC -> ReLU (with dropout) -> softmax
flat = Flatten()(drop_2)
hidden = Dense(hidden_size, activation='relu')(flat)
drop_3 = Dropout(drop_prob_2)(hidden)
out = Dense(num_classes, activation='softmax')(drop_3)

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Train on 45000 samples, validate on 5000 samples
Epoch 1/200
45000/45000 [==============================] - 9s - loss: 1.5435 - acc: 0.4359 - val_loss: 1.2057 - val_acc: 0.5672
Epoch 2/200
45000/45000 [==============================] - 9s - loss: 1.1544 - acc: 0.5886 - val_loss: 0.9679 - val_acc: 0.6566
Epoch 3/200
45000/45000 [==============================] - 8s - loss: 1.0114 - acc: 0.6418 - val_loss: 0.8807 - val_acc: 0.6870
Epoch 4/200
45000/45000 [==============================] - 8s - loss: 0.9183 - acc: 0.6766 - val_loss: 0.7945 - val_acc: 0.7224
Epoch 5/200
45000/45000 [==============================] - 9s - loss: 0.8507 - acc: 0.6994 - val_loss: 0.7531 - val_acc: 0.7400
Epoch 6/200
45000/45000 [==============================] - 9s - loss: 0.8064 - acc: 0.7161 - val_loss: 0.7174 - val_acc: 0.7496
Epoch 7/200
45000/45000 [==============================] - 9s - loss: 0.7561 - acc: 0.7331 - val_loss: 0.7116 - val_acc: 0.7622
Epoch 8/200
45000/45000 [==============================] - 9s - loss: 0.7156 - acc: 0.7476 - val_loss: 0.6773 - val_acc: 0.7670
Epoch 9/200
45000/45000 [==============================] - 9s - loss: 0.6833 - acc: 0.7594 - val_loss: 0.6855 - val_acc: 0.7644
Epoch 10/200
45000/45000 [==============================] - 9s - loss: 0.6580 - acc: 0.7656 - val_loss: 0.6608 - val_acc: 0.7748
Epoch 11/200
45000/45000 [==============================] - 9s - loss: 0.6308 - acc: 0.7750 - val_loss: 0.6854 - val_acc: 0.7730
Epoch 12/200
45000/45000 [==============================] - 9s - loss: 0.6035 - acc: 0.7832 - val_loss: 0.6853 - val_acc: 0.7744
Epoch 13/200
45000/45000 [==============================] - 9s - loss: 0.5871 - acc: 0.7914 - val_loss: 0.6762 - val_acc: 0.7748
Epoch 14/200
45000/45000 [==============================] - 8s - loss: 0.5693 - acc: 0.8000 - val_loss: 0.6868 - val_acc: 0.7740
Epoch 15/200
45000/45000 [==============================] - 9s - loss: 0.5555 - acc: 0.8036 - val_loss: 0.6835 - val_acc: 0.7792
Epoch 16/200
45000/45000 [==============================] - 9s - loss: 0.5370 - acc: 0.8126 - val_loss: 0.6885 - val_acc: 0.7774
Epoch 17/200
45000/45000 [==============================] - 9s - loss: 0.5270 - acc: 0.8134 - val_loss: 0.6604 - val_acc: 0.7866
Epoch 18/200
45000/45000 [==============================] - 9s - loss: 0.5090 - acc: 0.8194 - val_loss: 0.6652 - val_acc: 0.7860
Epoch 19/200
45000/45000 [==============================] - 9s - loss: 0.5066 - acc: 0.8193 - val_loss: 0.6632 - val_acc: 0.7858
Epoch 20/200
45000/45000 [==============================] - 9s - loss: 0.4938 - acc: 0.8248 - val_loss: 0.6844 - val_acc: 0.7872
Epoch 21/200
45000/45000 [==============================] - 9s - loss: 0.4684 - acc: 0.8361 - val_loss: 0.6861 - val_acc: 0.7904
Epoch 22/200
45000/45000 [==============================] - 9s - loss: 0.4696 - acc: 0.8365 - val_loss: 0.6349 - val_acc: 0.7980
Epoch 23/200
45000/45000 [==============================] - 9s - loss: 0.4584 - acc: 0.8387 - val_loss: 0.6592 - val_acc: 0.7926
Epoch 24/200
45000/45000 [==============================] - 9s - loss: 0.4410 - acc: 0.8443 - val_loss: 0.6822 - val_acc: 0.7876
Epoch 25/200
45000/45000 [==============================] - 8s - loss: 0.4404 - acc: 0.8454 - val_loss: 0.7103 - val_acc: 0.7784
Epoch 26/200
45000/45000 [==============================] - 8s - loss: 0.4276 - acc: 0.8512 - val_loss: 0.6783 - val_acc: 0.7858
Epoch 27/200
45000/45000 [==============================] - 8s - loss: 0.4152 - acc: 0.8542 - val_loss: 0.6657 - val_acc: 0.7944
Epoch 28/200
45000/45000 [==============================] - 9s - loss: 0.4107 - acc: 0.8549 - val_loss: 0.6861 - val_acc: 0.7888
Epoch 29/200
45000/45000 [==============================] - 9s - loss: 0.4115 - acc: 0.8548 - val_loss: 0.6634 - val_acc: 0.7996
Epoch 30/200
45000/45000 [==============================] - 9s - loss: 0.4057 - acc: 0.8586 - val_loss: 0.7166 - val_acc: 0.7896
Epoch 31/200
45000/45000 [==============================] - 9s - loss: 0.3992 - acc: 0.8605 - val_loss: 0.6734 - val_acc: 0.7998
Epoch 32/200
45000/45000 [==============================] - 9s - loss: 0.3863 - acc: 0.8637 - val_loss: 0.7263 - val_acc: 0.7844
Epoch 33/200
45000/45000 [==============================] - 9s - loss: 0.3933 - acc: 0.8644 - val_loss: 0.6953 - val_acc: 0.7860
Epoch 34/200
45000/45000 [==============================] - 9s - loss: 0.3838 - acc: 0.8663 - val_loss: 0.7040 - val_acc: 0.7916
Epoch 35/200
45000/45000 [==============================] - 9s - loss: 0.3800 - acc: 0.8674 - val_loss: 0.7233 - val_acc: 0.7970
Epoch 36/200
45000/45000 [==============================] - 9s - loss: 0.3775 - acc: 0.8697 - val_loss: 0.7234 - val_acc: 0.7922
Epoch 37/200
45000/45000 [==============================] - 9s - loss: 0.3681 - acc: 0.8746 - val_loss: 0.6751 - val_acc: 0.7958
Epoch 38/200
45000/45000 [==============================] - 9s - loss: 0.3679 - acc: 0.8732 - val_loss: 0.7014 - val_acc: 0.7976
Epoch 39/200
45000/45000 [==============================] - 9s - loss: 0.3540 - acc: 0.8769 - val_loss: 0.6768 - val_acc: 0.8022
Epoch 40/200
45000/45000 [==============================] - 9s - loss: 0.3531 - acc: 0.8783 - val_loss: 0.7171 - val_acc: 0.7986
Epoch 41/200
45000/45000 [==============================] - 9s - loss: 0.3545 - acc: 0.8786 - val_loss: 0.7164 - val_acc: 0.7930
Epoch 42/200
45000/45000 [==============================] - 9s - loss: 0.3453 - acc: 0.8799 - val_loss: 0.7078 - val_acc: 0.7994
Epoch 43/200
45000/45000 [==============================] - 8s - loss: 0.3488 - acc: 0.8798 - val_loss: 0.7272 - val_acc: 0.7958
Epoch 44/200
45000/45000 [==============================] - 9s - loss: 0.3471 - acc: 0.8797 - val_loss: 0.7110 - val_acc: 0.7916
Epoch 45/200
45000/45000 [==============================] - 9s - loss: 0.3443 - acc: 0.8810 - val_loss: 0.7391 - val_acc: 0.7952
Epoch 46/200
45000/45000 [==============================] - 9s - loss: 0.3342 - acc: 0.8841 - val_loss: 0.7351 - val_acc: 0.7970
Epoch 47/200
45000/45000 [==============================] - 9s - loss: 0.3311 - acc: 0.8842 - val_loss: 0.7302 - val_acc: 0.8008
Epoch 48/200
45000/45000 [==============================] - 9s - loss: 0.3320 - acc: 0.8868 - val_loss: 0.7145 - val_acc: 0.8002
Epoch 49/200
45000/45000 [==============================] - 9s - loss: 0.3264 - acc: 0.8883 - val_loss: 0.7640 - val_acc: 0.7942
Epoch 50/200
45000/45000 [==============================] - 9s - loss: 0.3247 - acc: 0.8880 - val_loss: 0.7289 - val_acc: 0.7948
Epoch 51/200
45000/45000 [==============================] - 9s - loss: 0.3279 - acc: 0.8886 - val_loss: 0.7340 - val_acc: 0.7910
Epoch 52/200
45000/45000 [==============================] - 9s - loss: 0.3224 - acc: 0.8901 - val_loss: 0.7454 - val_acc: 0.7914
Epoch 53/200
45000/45000 [==============================] - 9s - loss: 0.3219 - acc: 0.8916 - val_loss: 0.7328 - val_acc: 0.8016
Epoch 54/200
45000/45000 [==============================] - 9s - loss: 0.3163 - acc: 0.8919 - val_loss: 0.7442 - val_acc: 0.7996
Epoch 55/200
45000/45000 [==============================] - 9s - loss: 0.3071 - acc: 0.8962 - val_loss: 0.7427 - val_acc: 0.7898
Epoch 56/200
45000/45000 [==============================] - 9s - loss: 0.3158 - acc: 0.8944 - val_loss: 0.7685 - val_acc: 0.7920
Epoch 57/200
45000/45000 [==============================] - 8s - loss: 0.3126 - acc: 0.8942 - val_loss: 0.7717 - val_acc: 0.8062
Epoch 58/200
45000/45000 [==============================] - 9s - loss: 0.3156 - acc: 0.8919 - val_loss: 0.6993 - val_acc: 0.7984
Epoch 59/200
45000/45000 [==============================] - 9s - loss: 0.3030 - acc: 0.8970 - val_loss: 0.7359 - val_acc: 0.8016
Epoch 60/200
45000/45000 [==============================] - 9s - loss: 0.3022 - acc: 0.8969 - val_loss: 0.7427 - val_acc: 0.7954
Epoch 61/200
45000/45000 [==============================] - 9s - loss: 0.3072 - acc: 0.8950 - val_loss: 0.7829 - val_acc: 0.7996
Epoch 62/200
45000/45000 [==============================] - 9s - loss: 0.2977 - acc: 0.8996 - val_loss: 0.8096 - val_acc: 0.7958
Epoch 63/200
45000/45000 [==============================] - 9s - loss: 0.3033 - acc: 0.8983 - val_loss: 0.7424 - val_acc: 0.7972
Epoch 64/200
45000/45000 [==============================] - 9s - loss: 0.2985 - acc: 0.9003 - val_loss: 0.7779 - val_acc: 0.7930
Epoch 65/200
45000/45000 [==============================] - 8s - loss: 0.2931 - acc: 0.9004 - val_loss: 0.7302 - val_acc: 0.8010
Epoch 66/200
45000/45000 [==============================] - 8s - loss: 0.2948 - acc: 0.8994 - val_loss: 0.7861 - val_acc: 0.7900
Epoch 67/200
45000/45000 [==============================] - 9s - loss: 0.2911 - acc: 0.9026 - val_loss: 0.7502 - val_acc: 0.7918
Epoch 68/200
45000/45000 [==============================] - 9s - loss: 0.2951 - acc: 0.9001 - val_loss: 0.7911 - val_acc: 0.7820
Epoch 69/200
45000/45000 [==============================] - 9s - loss: 0.2869 - acc: 0.9026 - val_loss: 0.8025 - val_acc: 0.8024
Epoch 70/200
45000/45000 [==============================] - 8s - loss: 0.2933 - acc: 0.9013 - val_loss: 0.7703 - val_acc: 0.7978
Epoch 71/200
45000/45000 [==============================] - 8s - loss: 0.2902 - acc: 0.9007 - val_loss: 0.7685 - val_acc: 0.7962
Epoch 72/200
45000/45000 [==============================] - 9s - loss: 0.2920 - acc: 0.9025 - val_loss: 0.7412 - val_acc: 0.7956
Epoch 73/200
45000/45000 [==============================] - 8s - loss: 0.2861 - acc: 0.9038 - val_loss: 0.7957 - val_acc: 0.8026
Epoch 74/200
45000/45000 [==============================] - 8s - loss: 0.2785 - acc: 0.9069 - val_loss: 0.7522 - val_acc: 0.8002
Epoch 75/200
45000/45000 [==============================] - 9s - loss: 0.2811 - acc: 0.9064 - val_loss: 0.8181 - val_acc: 0.7902
Epoch 76/200
45000/45000 [==============================] - 9s - loss: 0.2841 - acc: 0.9053 - val_loss: 0.7695 - val_acc: 0.7990
Epoch 77/200
45000/45000 [==============================] - 9s - loss: 0.2853 - acc: 0.9061 - val_loss: 0.7608 - val_acc: 0.7972
Epoch 78/200
45000/45000 [==============================] - 9s - loss: 0.2714 - acc: 0.9080 - val_loss: 0.7534 - val_acc: 0.8034
Epoch 79/200
45000/45000 [==============================] - 9s - loss: 0.2797 - acc: 0.9072 - val_loss: 0.7188 - val_acc: 0.7988
Epoch 80/200
45000/45000 [==============================] - 9s - loss: 0.2682 - acc: 0.9110 - val_loss: 0.7751 - val_acc: 0.7954
Epoch 81/200
45000/45000 [==============================] - 9s - loss: 0.2885 - acc: 0.9038 - val_loss: 0.7711 - val_acc: 0.8010
Epoch 82/200
45000/45000 [==============================] - 9s - loss: 0.2705 - acc: 0.9094 - val_loss: 0.7613 - val_acc: 0.8000
Epoch 83/200
45000/45000 [==============================] - 9s - loss: 0.2738 - acc: 0.9095 - val_loss: 0.8300 - val_acc: 0.7944
Epoch 84/200
45000/45000 [==============================] - 9s - loss: 0.2795 - acc: 0.9066 - val_loss: 0.8001 - val_acc: 0.7912
Epoch 85/200
45000/45000 [==============================] - 9s - loss: 0.2721 - acc: 0.9086 - val_loss: 0.7862 - val_acc: 0.8092
Epoch 86/200
45000/45000 [==============================] - 9s - loss: 0.2752 - acc: 0.9087 - val_loss: 0.7331 - val_acc: 0.7942
Epoch 87/200
45000/45000 [==============================] - 9s - loss: 0.2725 - acc: 0.9089 - val_loss: 0.7999 - val_acc: 0.7914
Epoch 88/200
45000/45000 [==============================] - 9s - loss: 0.2644 - acc: 0.9108 - val_loss: 0.7944 - val_acc: 0.7990
Epoch 89/200
45000/45000 [==============================] - 9s - loss: 0.2725 - acc: 0.9106 - val_loss: 0.7622 - val_acc: 0.8006
Epoch 90/200
45000/45000 [==============================] - 9s - loss: 0.2622 - acc: 0.9129 - val_loss: 0.8172 - val_acc: 0.7988
Epoch 91/200
45000/45000 [==============================] - 9s - loss: 0.2772 - acc: 0.9085 - val_loss: 0.8243 - val_acc: 0.8004
Epoch 92/200
45000/45000 [==============================] - 9s - loss: 0.2609 - acc: 0.9136 - val_loss: 0.7723 - val_acc: 0.7992
Epoch 93/200
45000/45000 [==============================] - 9s - loss: 0.2666 - acc: 0.9129 - val_loss: 0.8366 - val_acc: 0.7932
Epoch 94/200
45000/45000 [==============================] - 9s - loss: 0.2593 - acc: 0.9135 - val_loss: 0.8666 - val_acc: 0.7956
Epoch 95/200
45000/45000 [==============================] - 9s - loss: 0.2692 - acc: 0.9100 - val_loss: 0.8901 - val_acc: 0.7954
Epoch 96/200
45000/45000 [==============================] - 8s - loss: 0.2569 - acc: 0.9160 - val_loss: 0.8515 - val_acc: 0.8006
Epoch 97/200
45000/45000 [==============================] - 8s - loss: 0.2636 - acc: 0.9146 - val_loss: 0.8639 - val_acc: 0.7960
Epoch 98/200
45000/45000 [==============================] - 9s - loss: 0.2693 - acc: 0.9113 - val_loss: 0.7891 - val_acc: 0.7916
Epoch 99/200
45000/45000 [==============================] - 9s - loss: 0.2611 - acc: 0.9144 - val_loss: 0.8650 - val_acc: 0.7928
Epoch 100/200
45000/45000 [==============================] - 9s - loss: 0.2589 - acc: 0.9121 - val_loss: 0.8683 - val_acc: 0.7990
Epoch 101/200
45000/45000 [==============================] - 9s - loss: 0.2601 - acc: 0.9142 - val_loss: 0.9116 - val_acc: 0.8030
Epoch 102/200
45000/45000 [==============================] - 9s - loss: 0.2616 - acc: 0.9138 - val_loss: 0.8229 - val_acc: 0.7928
Epoch 103/200
45000/45000 [==============================] - 9s - loss: 0.2603 - acc: 0.9140 - val_loss: 0.8847 - val_acc: 0.7994
Epoch 104/200
45000/45000 [==============================] - 9s - loss: 0.2579 - acc: 0.9150 - val_loss: 0.9079 - val_acc: 0.8004
Epoch 105/200
45000/45000 [==============================] - 8s - loss: 0.2696 - acc: 0.9127 - val_loss: 0.7450 - val_acc: 0.8002
Epoch 106/200
45000/45000 [==============================] - 9s - loss: 0.2555 - acc: 0.9161 - val_loss: 0.8186 - val_acc: 0.7992
Epoch 107/200
45000/45000 [==============================] - 9s - loss: 0.2631 - acc: 0.9160 - val_loss: 0.8686 - val_acc: 0.7920
Epoch 108/200
45000/45000 [==============================] - 9s - loss: 0.2524 - acc: 0.9178 - val_loss: 0.9136 - val_acc: 0.7956
Epoch 109/200
45000/45000 [==============================] - 9s - loss: 0.2569 - acc: 0.9151 - val_loss: 0.8148 - val_acc: 0.7994
Epoch 110/200
45000/45000 [==============================] - 9s - loss: 0.2586 - acc: 0.9150 - val_loss: 0.8826 - val_acc: 0.7984
Epoch 111/200
45000/45000 [==============================] - 9s - loss: 0.2520 - acc: 0.9155 - val_loss: 0.8621 - val_acc: 0.7980
Epoch 112/200
45000/45000 [==============================] - 9s - loss: 0.2586 - acc: 0.9157 - val_loss: 0.8149 - val_acc: 0.8038
Epoch 113/200
45000/45000 [==============================] - 9s - loss: 0.2623 - acc: 0.9151 - val_loss: 0.8361 - val_acc: 0.7972
Epoch 114/200
45000/45000 [==============================] - 9s - loss: 0.2535 - acc: 0.9177 - val_loss: 0.8618 - val_acc: 0.7970
Epoch 115/200
45000/45000 [==============================] - 8s - loss: 0.2570 - acc: 0.9164 - val_loss: 0.7687 - val_acc: 0.8044
Epoch 116/200
45000/45000 [==============================] - 9s - loss: 0.2501 - acc: 0.9183 - val_loss: 0.8270 - val_acc: 0.7934
Epoch 117/200
45000/45000 [==============================] - 8s - loss: 0.2535 - acc: 0.9182 - val_loss: 0.7861 - val_acc: 0.7986
Epoch 118/200
45000/45000 [==============================] - 9s - loss: 0.2507 - acc: 0.9184 - val_loss: 0.8203 - val_acc: 0.7996
Epoch 119/200
45000/45000 [==============================] - 9s - loss: 0.2530 - acc: 0.9173 - val_loss: 0.8294 - val_acc: 0.7904
Epoch 120/200
45000/45000 [==============================] - 9s - loss: 0.2599 - acc: 0.9160 - val_loss: 0.8458 - val_acc: 0.7902
Epoch 121/200
45000/45000 [==============================] - 9s - loss: 0.2483 - acc: 0.9164 - val_loss: 0.7573 - val_acc: 0.7976
Epoch 122/200
45000/45000 [==============================] - 8s - loss: 0.2492 - acc: 0.9190 - val_loss: 0.8435 - val_acc: 0.8012
Epoch 123/200
45000/45000 [==============================] - 9s - loss: 0.2528 - acc: 0.9179 - val_loss: 0.8594 - val_acc: 0.7964
Epoch 124/200
45000/45000 [==============================] - 9s - loss: 0.2581 - acc: 0.9173 - val_loss: 0.9037 - val_acc: 0.7944
Epoch 125/200
45000/45000 [==============================] - 8s - loss: 0.2404 - acc: 0.9212 - val_loss: 0.7893 - val_acc: 0.7976
Epoch 126/200
45000/45000 [==============================] - 8s - loss: 0.2492 - acc: 0.9177 - val_loss: 0.8679 - val_acc: 0.7982
Epoch 127/200
45000/45000 [==============================] - 8s - loss: 0.2483 - acc: 0.9196 - val_loss: 0.8894 - val_acc: 0.7956
Epoch 128/200
45000/45000 [==============================] - 9s - loss: 0.2539 - acc: 0.9176 - val_loss: 0.8413 - val_acc: 0.8006
Epoch 129/200
45000/45000 [==============================] - 8s - loss: 0.2477 - acc: 0.9184 - val_loss: 0.8151 - val_acc: 0.7982
Epoch 130/200
45000/45000 [==============================] - 9s - loss: 0.2586 - acc: 0.9188 - val_loss: 0.8173 - val_acc: 0.7954
Epoch 131/200
45000/45000 [==============================] - 9s - loss: 0.2498 - acc: 0.9189 - val_loss: 0.8539 - val_acc: 0.7996
Epoch 132/200
45000/45000 [==============================] - 9s - loss: 0.2426 - acc: 0.9190 - val_loss: 0.8543 - val_acc: 0.7952
Epoch 133/200
45000/45000 [==============================] - 9s - loss: 0.2460 - acc: 0.9185 - val_loss: 0.8665 - val_acc: 0.8008
Epoch 134/200
45000/45000 [==============================] - 9s - loss: 0.2436 - acc: 0.9216 - val_loss: 0.8933 - val_acc: 0.7950
Epoch 135/200
45000/45000 [==============================] - 8s - loss: 0.2468 - acc: 0.9203 - val_loss: 0.8270 - val_acc: 0.7940
Epoch 136/200
45000/45000 [==============================] - 9s - loss: 0.2479 - acc: 0.9194 - val_loss: 0.8365 - val_acc: 0.8052
Epoch 137/200
45000/45000 [==============================] - 9s - loss: 0.2449 - acc: 0.9206 - val_loss: 0.7964 - val_acc: 0.8018
Epoch 138/200
45000/45000 [==============================] - 9s - loss: 0.2440 - acc: 0.9220 - val_loss: 0.8784 - val_acc: 0.7914
Epoch 139/200
45000/45000 [==============================] - 9s - loss: 0.2485 - acc: 0.9198 - val_loss: 0.8259 - val_acc: 0.7852
Epoch 140/200
45000/45000 [==============================] - 9s - loss: 0.2482 - acc: 0.9204 - val_loss: 0.8954 - val_acc: 0.7960
Epoch 141/200
45000/45000 [==============================] - 9s - loss: 0.2344 - acc: 0.9249 - val_loss: 0.8708 - val_acc: 0.7874
Epoch 142/200
45000/45000 [==============================] - 9s - loss: 0.2476 - acc: 0.9204 - val_loss: 0.9190 - val_acc: 0.7954
Epoch 143/200
45000/45000 [==============================] - 9s - loss: 0.2415 - acc: 0.9223 - val_loss: 0.9607 - val_acc: 0.7960
Epoch 144/200
45000/45000 [==============================] - 9s - loss: 0.2377 - acc: 0.9232 - val_loss: 0.8987 - val_acc: 0.7970
Epoch 145/200
45000/45000 [==============================] - 9s - loss: 0.2481 - acc: 0.9201 - val_loss: 0.8611 - val_acc: 0.8048
Epoch 146/200
45000/45000 [==============================] - 9s - loss: 0.2504 - acc: 0.9197 - val_loss: 0.8411 - val_acc: 0.7938
Epoch 147/200
45000/45000 [==============================] - 9s - loss: 0.2450 - acc: 0.9216 - val_loss: 0.7839 - val_acc: 0.8028
Epoch 148/200
45000/45000 [==============================] - 9s - loss: 0.2327 - acc: 0.9250 - val_loss: 0.8910 - val_acc: 0.8054
Epoch 149/200
45000/45000 [==============================] - 9s - loss: 0.2432 - acc: 0.9219 - val_loss: 0.8568 - val_acc: 0.8000
Epoch 150/200
45000/45000 [==============================] - 9s - loss: 0.2436 - acc: 0.9236 - val_loss: 0.9061 - val_acc: 0.7938
Epoch 151/200
45000/45000 [==============================] - 9s - loss: 0.2434 - acc: 0.9222 - val_loss: 0.8439 - val_acc: 0.7986
Epoch 152/200
45000/45000 [==============================] - 9s - loss: 0.2439 - acc: 0.9225 - val_loss: 0.9002 - val_acc: 0.7994
Epoch 153/200
45000/45000 [==============================] - 8s - loss: 0.2373 - acc: 0.9237 - val_loss: 0.8756 - val_acc: 0.7880
Epoch 154/200
45000/45000 [==============================] - 8s - loss: 0.2359 - acc: 0.9238 - val_loss: 0.8514 - val_acc: 0.7936
Epoch 155/200
45000/45000 [==============================] - 9s - loss: 0.2435 - acc: 0.9222 - val_loss: 0.8377 - val_acc: 0.8080
Epoch 156/200
45000/45000 [==============================] - 9s - loss: 0.2478 - acc: 0.9204 - val_loss: 0.8831 - val_acc: 0.7992
Epoch 157/200
45000/45000 [==============================] - 9s - loss: 0.2337 - acc: 0.9253 - val_loss: 0.8453 - val_acc: 0.7994
Epoch 158/200
45000/45000 [==============================] - 9s - loss: 0.2336 - acc: 0.9257 - val_loss: 0.9027 - val_acc: 0.7882
Epoch 159/200
45000/45000 [==============================] - 9s - loss: 0.2384 - acc: 0.9230 - val_loss: 0.9121 - val_acc: 0.8016
Epoch 160/200
45000/45000 [==============================] - 9s - loss: 0.2481 - acc: 0.9217 - val_loss: 0.9495 - val_acc: 0.7974
Epoch 161/200
45000/45000 [==============================] - 9s - loss: 0.2450 - acc: 0.9224 - val_loss: 0.8510 - val_acc: 0.7884
Epoch 162/200
45000/45000 [==============================] - 9s - loss: 0.2433 - acc: 0.9220 - val_loss: 0.8979 - val_acc: 0.7948
Epoch 163/200
45000/45000 [==============================] - 9s - loss: 0.2339 - acc: 0.9262 - val_loss: 0.8979 - val_acc: 0.7978
Epoch 164/200
45000/45000 [==============================] - 9s - loss: 0.2298 - acc: 0.9257 - val_loss: 0.9036 - val_acc: 0.7990
Epoch 165/200
45000/45000 [==============================] - 9s - loss: 0.2404 - acc: 0.9236 - val_loss: 0.8341 - val_acc: 0.8052
Epoch 166/200
45000/45000 [==============================] - 9s - loss: 0.2402 - acc: 0.9227 - val_loss: 0.8731 - val_acc: 0.7996
Epoch 167/200
45000/45000 [==============================] - 9s - loss: 0.2367 - acc: 0.9250 - val_loss: 0.9218 - val_acc: 0.7992
Epoch 168/200
45000/45000 [==============================] - 9s - loss: 0.2267 - acc: 0.9262 - val_loss: 0.8767 - val_acc: 0.7922
Epoch 169/200
45000/45000 [==============================] - 9s - loss: 0.2336 - acc: 0.9254 - val_loss: 0.8418 - val_acc: 0.8038
Epoch 170/200
45000/45000 [==============================] - 9s - loss: 0.2434 - acc: 0.9232 - val_loss: 0.8362 - val_acc: 0.7920
Epoch 171/200
45000/45000 [==============================] - 9s - loss: 0.2328 - acc: 0.9265 - val_loss: 0.8712 - val_acc: 0.7950
Epoch 172/200
45000/45000 [==============================] - 9s - loss: 0.2346 - acc: 0.9262 - val_loss: 0.9256 - val_acc: 0.7976
Epoch 173/200
45000/45000 [==============================] - 8s - loss: 0.2382 - acc: 0.9242 - val_loss: 0.8875 - val_acc: 0.7982
Epoch 174/200
45000/45000 [==============================] - 9s - loss: 0.2400 - acc: 0.9239 - val_loss: 0.8264 - val_acc: 0.7864
Epoch 175/200
45000/45000 [==============================] - 9s - loss: 0.2334 - acc: 0.9261 - val_loss: 0.9178 - val_acc: 0.8014
Epoch 176/200
45000/45000 [==============================] - 9s - loss: 0.2427 - acc: 0.9219 - val_loss: 0.8458 - val_acc: 0.7920
Epoch 177/200
45000/45000 [==============================] - 9s - loss: 0.2310 - acc: 0.9257 - val_loss: 0.9171 - val_acc: 0.8062
Epoch 178/200
45000/45000 [==============================] - 9s - loss: 0.2310 - acc: 0.9265 - val_loss: 0.8544 - val_acc: 0.7990
Epoch 179/200
45000/45000 [==============================] - 9s - loss: 0.2378 - acc: 0.9240 - val_loss: 0.9259 - val_acc: 0.8000
Epoch 180/200
45000/45000 [==============================] - 9s - loss: 0.2381 - acc: 0.9242 - val_loss: 0.8573 - val_acc: 0.8056
Epoch 181/200
45000/45000 [==============================] - 9s - loss: 0.2231 - acc: 0.9297 - val_loss: 0.8935 - val_acc: 0.8002
Epoch 182/200
45000/45000 [==============================] - 9s - loss: 0.2419 - acc: 0.9248 - val_loss: 1.0145 - val_acc: 0.7900
Epoch 183/200
45000/45000 [==============================] - 9s - loss: 0.2336 - acc: 0.9266 - val_loss: 0.8838 - val_acc: 0.8006
Epoch 184/200
45000/45000 [==============================] - 9s - loss: 0.2429 - acc: 0.9242 - val_loss: 0.8685 - val_acc: 0.7918
Epoch 185/200
45000/45000 [==============================] - 9s - loss: 0.2317 - acc: 0.9260 - val_loss: 0.8297 - val_acc: 0.7942
Epoch 186/200
45000/45000 [==============================] - 9s - loss: 0.2330 - acc: 0.9264 - val_loss: 0.8831 - val_acc: 0.8026
Epoch 187/200
45000/45000 [==============================] - 9s - loss: 0.2353 - acc: 0.9254 - val_loss: 0.8934 - val_acc: 0.7956
Epoch 188/200
45000/45000 [==============================] - 9s - loss: 0.2312 - acc: 0.9247 - val_loss: 0.9275 - val_acc: 0.8042
Epoch 189/200
45000/45000 [==============================] - 9s - loss: 0.2239 - acc: 0.9282 - val_loss: 0.9246 - val_acc: 0.7934
Epoch 190/200
45000/45000 [==============================] - 9s - loss: 0.2349 - acc: 0.9253 - val_loss: 0.8628 - val_acc: 0.8000
Epoch 191/200
45000/45000 [==============================] - 9s - loss: 0.2313 - acc: 0.9266 - val_loss: 0.9020 - val_acc: 0.7978
Epoch 192/200
45000/45000 [==============================] - 9s - loss: 0.2358 - acc: 0.9254 - val_loss: 0.9481 - val_acc: 0.7966
Epoch 193/200
45000/45000 [==============================] - 9s - loss: 0.2298 - acc: 0.9276 - val_loss: 0.8791 - val_acc: 0.8010
Epoch 194/200
45000/45000 [==============================] - 9s - loss: 0.2279 - acc: 0.9265 - val_loss: 0.8890 - val_acc: 0.7976
Epoch 195/200
45000/45000 [==============================] - 9s - loss: 0.2330 - acc: 0.9273 - val_loss: 0.8893 - val_acc: 0.7890
Epoch 196/200
45000/45000 [==============================] - 9s - loss: 0.2416 - acc: 0.9243 - val_loss: 0.9002 - val_acc: 0.7922
Epoch 197/200
45000/45000 [==============================] - 9s - loss: 0.2309 - acc: 0.9273 - val_loss: 0.9232 - val_acc: 0.7990
Epoch 198/200
45000/45000 [==============================] - 9s - loss: 0.2247 - acc: 0.9278 - val_loss: 0.9474 - val_acc: 0.7980
Epoch 199/200
45000/45000 [==============================] - 9s - loss: 0.2335 - acc: 0.9256 - val_loss: 0.9177 - val_acc: 0.8000
Epoch 200/200
45000/45000 [==============================] - 9s - loss: 0.2378 - acc: 0.9254 - val_loss: 0.9205 - val_acc: 0.7966
 9984/10000 [============================>.] - ETA: 0s




[0.97292723369598388, 0.7853]

This model achieves an accuracy of 78.6%∼78.6% on the test set; for such a difficult task (*where human performance is only around 94%94%*), and given the relative simplicity of this model, this is a respectable result. However, more sophisticated models have recently been able to get as far as 96.53%96.53%.

I appreciate that tinkering with this model might be cumbersome if you do not have a GPU in your possession. I would, however, encourage you to apply a similar model to the previously discussed MNIST dataset; you should be able to break 99.3%99.3% accuracy on its test set with little to no effort using a CNN with dropout.

Conclusion

Throughout this post we have covered the essentials of convolutional neural networks, introduced the problem of overfitting, and made a very brief dent into how it could be rectified via regularisation (by applying dropout) and successfully implemented a four-layer deep CNN in Keras, applying it to CIFAR-10, all in under 50 lines of code.

Next time around, we will focus on some assorted topics, tips and tricks that should help you when fine-tuning models at this scale, and extracting more power out of your models while keeping overfitting in check.

ABOUT THE AUTHOR

Petar Veličković

Petar is currently a Research Assistant in Computational Biology within the Artificial Intelligence Group of the Cambridge University Computer Laboratory, where he is working on developing machine learning algorithms on complex networks, and their applications to bioinformatics. He is also a PhD student within the group, supervised by Dr Pietro Liò and affiliated with Trinity College. He holds a BA degree in Computer Science from the University of Cambridge, having completed the Computer Science Tripos in 2015.

Just show me the code!

from keras.datasets import cifar10 # subroutines for fetching the CIFAR-10 dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Convolution2D, MaxPooling2D, Dense, Dropout, Activation, Flatten
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values
import numpy as np

batch_size = 32 # in each iteration, we consider 32 training examples at once
num_epochs = 200 # we iterate 200 times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth_1 = 32 # we will initially have 32 kernels per conv. layer...
conv_depth_2 = 64 # ...switching to 64 after the first pooling layer
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 512 # the FC layer will have 512 neurons

(X_train, y_train), (X_test, y_test) = cifar10.load_data() # fetch CIFAR-10 data

num_train, depth, height, width = X_train.shape # there are 50000 training examples in CIFAR-10 
num_test = X_test.shape[0] # there are 10000 test examples in CIFAR-10
num_classes = np.unique(y_train).shape[0] # there are 10 image classes

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= np.max(X_train) # Normalise data to [0, 1] range
X_test /= np.max(X_train) # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

inp = Input(shape=(depth, height, width)) # N.B. depth goes first in Keras!
# Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer)
conv_1 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(inp)
conv_2 = Convolution2D(conv_depth_1, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_1)
pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
drop_1 = Dropout(drop_prob_1)(pool_1)
# Conv [64] -> Conv [64] -> Pool (with dropout on the pooling layer)
conv_3 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(drop_1)
conv_4 = Convolution2D(conv_depth_2, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_3)
pool_2 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_4)
drop_2 = Dropout(drop_prob_1)(pool_2)
# Now flatten to 1D, apply FC -> ReLU (with dropout) -> softmax
flat = Flatten()(drop_2)
hidden = Dense(hidden_size, activation='relu')(flat)
drop_3 = Dropout(drop_prob_2)(hidden)
out = Dense(num_classes, activation='softmax')(drop_3)

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!

Deep learning for complete beginners: neural network fine-tuning techniquesby Cambridge Coding Academy | Download notebook

Introduction

Welcome to the third (and final) in a series of blog posts that is designed to get you quickly up to speed with deep learning; from first principles, all the way to discussions of some of the intricate details, with the purposes of achieving respectable performance on two established machine learning benchmarks: MNIST (classification of handwritten digits) and CIFAR-10 (classification of small images across 10 distinct classes—airplane, automobile, bird, cat, deer, dog, frog, horse, ship & truck).

MNIST CIFAR-10

Last time around, I have introduced the convolutional neural network model, and illustrated how, combined with a simple but effective regularisation method of dropout, it may quickly achieve an accuracy level of 78.6%78.6% on CIFAR-10, leveraging the Keras deep learning framework.

By now, you have acquired the fundamental skills necessary to apply deep learning to most problems of interest (a notable exception, outside of the scope of these tutorials, is the problem of processing time-series of arbitrary length, for which a recurrent neural network (RNN) model is often preferable). In this tutorial, I will wrap up with an important but often overlooked aspect of tutorials such as this one—the tips and tricks for properly fine-tuning a model, to make it generalise better than the initial baseline you started out with.

This tutorial will, for the most part, assume familiarity with the previous two in the series.

Hyperparameter tuning and the baseline model

Typically, the design process for neural networks starts off by designing a simple network, either directly applying architectures that have shown successes for similar problems, or trying out hyperparameter values that generally seem effective. Eventually, we will hopefully attain performance values that seem like a nice baseline starting point, after which we may look into modifying every fixed detail in order to extract the maximal performance capacity out of the network. This is commonly known as hyperparameter tuning, because it involves modifying the components of the network which need to be specified before training.

While the methods described here can yield far more tangible improvements on CIFAR-10, due to the relative difficulty of rapid prototyping on it without a GPU, we will focus specifically on improving performance on the MNIST benchmark. Of course, I do invite you to have a go at applying methods like these to CIFAR-10 and see the kinds of gains you may achieve compared to the basic CNN approach, should your resources allow for it.

We will start off with the baseline CNN given below. If you find any aspects of this code unclear, I invite you to familiarise yourself with the previous two tutorials in the series—all the relevant concepts have already been introduced there.

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense, Flatten, Convolution2D, MaxPooling2D, Dropout
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 12 # we iterate twelve times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth = 32 # use 32 kernels in both convolutional layers
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 128 # there will be 128 neurons in both hidden layers

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(X_train.shape[0], depth, height, width)
X_test = X_test.reshape(X_test.shape[0], depth, height, width)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

inp = Input(shape=(depth, height, width)) # N.B. Keras expects channel dimension first
# Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer)
conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', activation='relu')(inp)
conv_2 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_1)
pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
drop_1 = Dropout(drop_prob_1)(pool_1)
flat = Flatten()(drop_1)
hidden = Dense(hidden_size, activation='relu')(flat) # Hidden ReLU layer
drop = Dropout(drop_prob_2)(hidden)
out = Dense(num_classes, activation='softmax')(drop) # Output softmax layer

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

model.fit(X_train, Y_train, # Train the model using the training set...
          batch_size=batch_size, nb_epoch=num_epochs,
          verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation
model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Train on 54000 samples, validate on 6000 samples
Epoch 1/12
54000/54000 [==============================] - 4s - loss: 0.3010 - acc: 0.9073 - val_loss: 0.0612 - val_acc: 0.9825
Epoch 2/12
54000/54000 [==============================] - 4s - loss: 0.1010 - acc: 0.9698 - val_loss: 0.0400 - val_acc: 0.9893
Epoch 3/12
54000/54000 [==============================] - 4s - loss: 0.0753 - acc: 0.9775 - val_loss: 0.0376 - val_acc: 0.9903
Epoch 4/12
54000/54000 [==============================] - 4s - loss: 0.0629 - acc: 0.9809 - val_loss: 0.0321 - val_acc: 0.9913
Epoch 5/12
54000/54000 [==============================] - 4s - loss: 0.0520 - acc: 0.9837 - val_loss: 0.0346 - val_acc: 0.9902
Epoch 6/12
54000/54000 [==============================] - 4s - loss: 0.0466 - acc: 0.9850 - val_loss: 0.0361 - val_acc: 0.9912
Epoch 7/12
54000/54000 [==============================] - 4s - loss: 0.0405 - acc: 0.9871 - val_loss: 0.0330 - val_acc: 0.9917
Epoch 8/12
54000/54000 [==============================] - 4s - loss: 0.0386 - acc: 0.9879 - val_loss: 0.0326 - val_acc: 0.9908
Epoch 9/12
54000/54000 [==============================] - 4s - loss: 0.0349 - acc: 0.9894 - val_loss: 0.0369 - val_acc: 0.9908
Epoch 10/12
54000/54000 [==============================] - 4s - loss: 0.0315 - acc: 0.9901 - val_loss: 0.0277 - val_acc: 0.9923
Epoch 11/12
54000/54000 [==============================] - 4s - loss: 0.0287 - acc: 0.9906 - val_loss: 0.0346 - val_acc: 0.9922
Epoch 12/12
54000/54000 [==============================] - 4s - loss: 0.0273 - acc: 0.9909 - val_loss: 0.0264 - val_acc: 0.9930
 9888/10000 [============================>.] - ETA: 0s




[0.026324689089493085, 0.99119999999999997]

As can be seen, our model achieves an accuracy level of 99.12%99.12% on the test set. This is slightly better than the MLP model explored in the first tutorial, but it should be easy to do even better!

The core of this tutorial will explore common ways in which a baseline neural network such as this one can often be improved (we will keep the basic CNN architecture fixed), after which we will evaluate the relative gains we have achieved.

L2L2 regularisation

As already detailedly explained in the previous tutorial, one of the primary pitfalls of machine learning is overfitting, when the model sometimes catastrophically sacrifices generalisation performance for the purposes of minimising training loss.

Previously, we introduced dropout as a very simple way to keep overfitting in check.

There are several other common regularisers that we can apply to our networks. Arguably the most popular out of them is L2L2 regularisation (sometimes also called weight decay), which takes a more direct approach than dropout for regularising. Namely, a common underlying cause for overfitting is that our model is too complex (in terms of parameter count) for the problem and training set size at hand. A regulariser, in this sense, aims to decrease complexity of the model while maintaining parameter count the same. L2L2 regularisation does so by penalising weights with large magnitudes, by minimising their L2L2 norm, using a hyperparameter λλ to specify the relative importance of minimising the norm to minimising the loss on the training set. Introducing this regulariser effectively adds a cost of λ2||w||2=λ2Wi=0w2iλ2||w→||2=λ2∑i=0Wwi2 (the 1/21/2 factor is there just for nicer backpropagation updates) to the loss function L(^y,y)L(y^→,y→) (in our case, this was the cross-entropy loss).

Note that choosing λλ properly is important. For too low values, the effect of the regulariser will be negligible, and for too high values, the optimal model will set all the weights to zero. We will set λ=0.0001λ=0.0001 here; to add this regulariser to our model, we need an additional import, after which it’s as simple as adding a W_regularizer parameter to each layer we want to regularise:

from keras.regularizers import l2 # L2-regularisation
# ...
l2_lambda = 0.0001
# ...
# This is how to add L2-regularisation to any Keras layer with weights (e.g. Convolution2D/Dense)
conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', W_regularizer=l2(l2_lambda), activation='relu')(inp)

Network initialisation

One issue that was completely overlooked in previous tutorials (and a lot of other existing tutorials as well!) is the policy used for assigning initial weights to the various layers within the network. Clearly, this is a very important issue: simply initialising all weights to zero, for example, would significantly impede learning given that no weight would initially be active. Uniform initialisation between ±1±1 is also not typically the best way to go—in fact, sometimes (depending on problem and model complexity) choosing the proper initialisation for each layer could mean the difference between superb performance and achieving no convergence at all! Even if the problem does not pose such issues, initialising the weights in an appropriate way can be significantly influential to how easily the network learns from the training set (as it effectively preselects the initial position of the model parameters with respect to the loss function to optimise).

Here I will mention two schemes that are particularly of interest:
Xavier (sometimes Glorot) initialisation: The key idea behind this initialisation scheme is to make it easier for a signal to pass through the layer during forward as well as backward propagation, for a linear activation (this also works nicely for sigmoid activations because the interval where they are unsaturated is roughly linear as well). It draws weights from a probability distribution (uniform or normal) with variance equal to: Var(W)=2nin+noutVar(W)=2nin+nout, where ninnin and noutnoutare the numbers of neurons in the previous and next layer, respectively.
He initialisation: This scheme is a version of the Xavier initialisation more suitable for ReLU activations, compensating for the fact that this activation is zero for half of the possible input space. Namely, Var(W)=2ninVar(W)=2nin in this case.

In order to derive the required variance for the Xavier initialisation, consider what happens to the variance of the output of a linear neuron (ignoring the bias term), based on the variance of its inputs, assuming that the weights and inputs are uncorrelated, and are both zero-mean:

Var(nini=1wixi)=nini=1Var(wixi)=nini=1Var(W)Var(X)=ninVar(W)Var(X)Var(∑i=1ninwixi)=∑i=1ninVar(wixi)=∑i=1ninVar(W)Var(X)=ninVar(W)Var(X)

This implies that, in order to preserve variance of the input after passing through the layer, it must hold that Var(W)=1ninVar(W)=1nin. We may apply a similar argument to the backpropagation update to get that Var(W)=1noutVar(W)=1nout. As we cannot typically satisfy these two constraints simultaneously, we set the variance of the weights to their average, i.e. Var(W)=2nin+noutVar(W)=2nin+nout, which usually works quite well in practice.

These two schemes will be sufficient for most examples you will encounter (although the orthogonal initialisation is also worth investigating in some cases, particularly when initialising recurrent neural networks). Adding a custom initialisation to a layer is simple: you only need to specify an init parameter for it, as described below. We will be using the uniform He initialisation (he_uniform) for all ReLU layers and the uniform Xavier initialisation (glorot_uniform) for the output softmax layer (as it is effectively a generalisation of the logistic function for multiple inputs).

# Add He initialisation to a layer
conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(inp)
# Add Xavier initialisation to a layer
out = Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)

Batch normalisation

If there’s one technique I would like you to pick up and readily use upon reading this tutorial, it has to be batch normalisation, a method for speeding up deep learning pioneered by Ioffe and Szegedy in early 2015, already accumulating 560 citations on arXiv! It is based on a really simple failure mode that impedes efficient training of deep networks: as the signal propagates through the network, even if we normalised it in our input, in an intermediate hidden layer it may well end up completely skewed in both mean and variance properties (an effect called the internal covariance shift by the original authors), meaning that there will be potentially severe discrepancies between gradient updates across different layers. This requires us to be more conservative with our learning rate, and apply stronger regularisers, significantly slowing down learning.

Batch normalisation’s answer to this is really quite simple: normalise the activations to a layer to zero mean and unit variance, across the current batch of data being passed through the network (this means that, during training, we normalise across batch_size examples, and during testing, we normalise across statistics derived from the entire training set—as the testing data cannot be seen upfront). Namely, we compute the mean and variance statistics for a particular batch of activations B=x1,,xmB=x1,…,xm as follows:

μB=1mmi=1xiμB=1m∑i=1mxi

σ2B=1mmi=1(xiμB)2σB2=1m∑i=1m(xi−μB)2

We then use these statistics to transform the activations so that they have zero mean and unit variance across the batch, as follows:

^xi=xiμBσ2B+εx^i=xi−μBσB2+ε

where ε>0ε>0 is a small “fuzz” parameter designed to protect us from dividing by zero (in an event where the batch standard deviation is very small or even zero). Finally, to obtain the final activations yy, we need to make sure that we haven’t lost any generalisation properties by performing the normalisation—and since the operations we performed on the original data were a scale and shift, we allow for an arbitrary scale and shift on the normalised values to obtain the final acivations (this allows the network, for example, to fall back to the original values if it finds this to be more useful):

yi=γ^xi+βyi=γx^i+β

where ββ and γγ are trainable parameters of the batch normalisation operation (can be optimised via gradient descent on the training data). This generalisation also means that batch normalisation can often be usefully applied directly to the inputs of a neural network (given that the presence of these parameters allows the network to assume a different input statistic to the one we selected through manual preprocessing of the data).

This method, when applied to the layers within a deep convolutional neural network almost always achieves significant success with its original design goal of speeding up training. Even further, it acts as a great regulariser, allowing us to be far more careless with our choice of learning rate, L2L2 regularisation strength and use of dropout (sometimes making it completely unnecessary). This regularisation occurs as a consequence of the fact that the output of the network for a single example is no longer deterministic (it depends on the entire batch within which it is contained), helping the network generalise easier.

A final piece of analysis: while the authors of batch normalisation suggest performing it before applying the activation function of the neuron (on the computed linear combinations of the input data), recently published results suggest that it might be more beneficial (and at least as good) to do it after, which is what we will be doing within this tutorial.

Adding batch normalisation to our network is simple through Keras; expressed by a BatchNormalization layer, to which we may provide a few parameters, the most important of which is axis (along which axis of the data should the statistics be computed). Specially, when dealing with the inputs to convolutional layers, we would usually like to normalise across individual channels, and therefore we set axis = 1.

from keras.layers.normalization import BatchNormalization # batch normalisation
# ...
inp_norm = BatchNormalization(axis=1)(inp) # apply BN to the input (N.B. need to rename here)
# conv_1 = Convolution2D(...)(inp_norm)
conv_1 = BatchNormalization(axis=1)(conv_1) # apply BN to the first conv layer

Data augmentation

While the previously discussed methods have all tuned the model specification, it is often useful to consider data-driven fine-tuning as well—especially when dealing with image recognition tasks.

Imagine that we trained a neural network on handwritten digits which all, roughly, had the same bounding box, and were nicely oriented. Now consider what happens when someone presents the network with a slightly shifted, scaled and rotated version of a training image to test on: its confidence in the correct class is bound to drop. We would ideally want to instruct the model to remain invariant under feasible levels of such distortions, but our model can only learn from the samples we provided to it, given that it performs a kind of statistical analysis and extrapolation from the training set!

Luckily, there is a very simple remedy to this problem which is often quite effective, especially on image recognition tasks: artificially augment the data with distorted versions during training! This means that, prior to feeding an example to the network for training, we will apply any transformations to it that we find appropriate, and therefore allow the network to directly observe the effects of applying them on data and instructing it to behave better on such examples. For illustration purposes, here are a few shifted/scaled/sheared/rotated examples of MNIST digits:

shift shift shear shift & scale rotate & scale

Keras provides a really nice interface to image data augmentation by way of the ImageDataGenerator class. We initialise the class by providing it with the kinds of transformations we want performed to every image, and then feed our training data through the generator, by way of performing a call to its fit method followed by its flow method, returning an (infinitely-extending) iterator across augmented batches. There is even a custom model.fit_generatormethod which will directly perform training of our model using this iterator, simplifying the code significantly! A slight downside is that we now lose the validation_split parameter, meaning we have to separate the validation dataset ourselves, but this adds only four extra lines of code.

For this tutorial, we will apply random horizontal and vertical shifts to the data. ImageDataGenerator also provides us with methods for applying random rotations, scales, shears and flips. These should all also be sensible transformations to attempt, except for the flips, due to the fact that we are not ever expecting to receive flipped handwritten digits from a person.

from keras.preprocessing.image import ImageDataGenerator # data augmentation
# ... after model.compile(...)
# Explicitly split the training and validation sets
X_val = X_train[54000:]
Y_val = Y_train[54000:]
X_train = X_train[:54000]
Y_train = Y_train[:54000]

datagen = ImageDataGenerator(
            width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
            height_shift_range=0.1) # randomly shift images vertically (fraction of total height)
datagen.fit(X_train)

# fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit
model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=num_epochs,
                        validation_data=(X_val, Y_val),
                        verbose=1)

Ensembles

One interesting aspect of neural networks that you might observe when using them for classification (on more than two classes) is that, when trained from different initial conditions, they will express better discriminative properties for different classes, and will tend to get more confused on others. On the MNIST example, you might find that a single trained network becomes very good at distinguishing a three from a five, but as a consequence does not learn to distinguish ones from sevens properly; while another trained network could well do the exact opposite.

This discrepancy may be exploited through an ensemble method—rather than building just one model, build several copies of it (with different initial values), and average their predictions on a particular input to obtain the final answer. Here we will perform this across three separate models. The difference between the two architectures can be easily visualised by a diagram such as the one below (plotted within Keras!):

Baseline (with BN) Ensemble

Keras once again provides us with a way to effectively do this at minimial expense to code length – we may wrap the constituent models’ constructions in a loop, extracting just their outputs for a final three-way merge layer.

from keras.layers import merge # for merging predictions in an ensemble
# ...
ens_models = 3 # we will train three separate models on the data
# ...
inp_norm = BatchNormalization(axis=1)(inp) # Apply BN to the input (N.B. need to rename here)

outs = [] # the list of ensemble outputs
for i in range(ens_models):
    # conv_1 = Convolution2D(...)(inp_norm)
    # ...
    outs.append(Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)) # Output softmax layer

out = merge(outs, mode='ave') # average the predictions to obtain the final output

Early stopping

I will discuss one further method here, as an introduction to a much wider area of hyperparameter optimisation. Namely, thus far we have utilised the validation dataset just to monitor training progress, which is arguably wasteful (given that we don’t do anything constructive with this data, other than observe successive losses on it). In fact, validation data represents the primary platform for evaluating hyperparameters of the network (such as depth, neuron/kernel numbers, regularisation factors, etc). We may imagine running our network with different combinations of hyperparameters we wish to optimise, and then basing our decisions on their performance on the validation set. Keep in mind that we may not observe the test dataset until we have irrevocably committed ourselves to all hyperparameters, given that otherwise features of the test set may inadvertently flow into the training procedure! This is sometimes known as the golden rule of machine learning, and breaking it was a common failure of many early approaches.

Perhaps the simplest use of the validation set is for tuning the number of epochs, through a procedure known as early stopping; simply stop training once the validation loss hasn’t decreased for a fixed number of epochs (a parameter known as patience). As this is a relatively small benchmark which saturates quickly, we will have a patience of five epochs, and increase the upper bound on epochs to 50 (which will likely never be reached).

Keras supports early stopping through an EarlyStopping callback class. Callbacks are methods that are called after each epoch of training, upon supplying a callbacks parameter to the fit or fit_generator method of the model. As usual, this is very concise, adding only a single line to our program.

from keras.callbacks import EarlyStopping
# ...
num_epochs = 50 # we iterate at most fifty times over the entire training set
# ...
# fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit
model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=num_epochs,
                        validation_data=(X_val, Y_val),
                        verbose=1,
                        callbacks=[EarlyStopping(monitor='val_loss', patience=5)]) # adding early stopping

Just show me the code! (also, how well does it do?)

With these six techniques applied to our original baseline, the final version of your code should look something like the following:

from keras.datasets import mnist # subroutines for fetching the MNIST dataset
from keras.models import Model # basic class for specifying and training a neural network
from keras.layers import Input, Dense, Flatten, Convolution2D, MaxPooling2D, Dropout, merge
from keras.utils import np_utils # utilities for one-hot encoding of ground truth values
from keras.regularizers import l2 # L2-regularisation
from keras.layers.normalization import BatchNormalization # batch normalisation
from keras.preprocessing.image import ImageDataGenerator # data augmentation
from keras.callbacks import EarlyStopping # early stopping

batch_size = 128 # in each iteration, we consider 128 training examples at once
num_epochs = 50 # we iterate at most fifty times over the entire training set
kernel_size = 3 # we will use 3x3 kernels throughout
pool_size = 2 # we will use 2x2 pooling throughout
conv_depth = 32 # use 32 kernels in both convolutional layers
drop_prob_1 = 0.25 # dropout after pooling with probability 0.25
drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5
hidden_size = 128 # there will be 128 neurons in both hidden layers
l2_lambda = 0.0001 # use 0.0001 as a L2-regularisation factor
ens_models = 3 # we will train three separate models on the data

num_train = 60000 # there are 60000 training examples in MNIST
num_test = 10000 # there are 10000 test examples in MNIST

height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data

X_train = X_train.reshape(X_train.shape[0], depth, height, width)
X_test = X_test.reshape(X_test.shape[0], depth, height, width)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

# Explicitly split the training and validation sets
X_val = X_train[54000:]
Y_val = Y_train[54000:]
X_train = X_train[:54000]
Y_train = Y_train[:54000]

inp = Input(shape=(depth, height, width)) # N.B. Keras expects channel dimension first
inp_norm = BatchNormalization(axis=1)(inp) # Apply BN to the input (N.B. need to rename here)

outs = [] # the list of ensemble outputs
for i in range(ens_models):
    # Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer), applying BN in between
    conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(inp_norm)
    conv_1 = BatchNormalization(axis=1)(conv_1)
    conv_2 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(conv_1)
    conv_2 = BatchNormalization(axis=1)(conv_2)
    pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2)
    drop_1 = Dropout(drop_prob_1)(pool_1)
    flat = Flatten()(drop_1)
    hidden = Dense(hidden_size, init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(flat) # Hidden ReLU layer
    hidden = BatchNormalization(axis=1)(hidden)
    drop = Dropout(drop_prob_2)(hidden)
    outs.append(Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)) # Output softmax layer

out = merge(outs, mode='ave') # average the predictions to obtain the final output

model = Model(input=inp, output=out) # To define a model, just specify its input and output layers

model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function
              optimizer='adam', # using the Adam optimiser
              metrics=['accuracy']) # reporting the accuracy

datagen = ImageDataGenerator(
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1)  # randomly shift images vertically (fraction of total height)
datagen.fit(X_train)

# fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit
model.fit_generator(datagen.flow(X_train, Y_train,
                        batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=num_epochs,
                        validation_data=(X_val, Y_val),
                        verbose=1,
                        callbacks=[EarlyStopping(monitor='val_loss', patience=5)]) # adding early stopping

model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Epoch 1/50
54000/54000 [==============================] - 30s - loss: 0.3487 - acc: 0.9031 - val_loss: 0.0579 - val_acc: 0.9863
Epoch 2/50
54000/54000 [==============================] - 30s - loss: 0.1441 - acc: 0.9634 - val_loss: 0.0424 - val_acc: 0.9890
Epoch 3/50
54000/54000 [==============================] - 30s - loss: 0.1126 - acc: 0.9716 - val_loss: 0.0405 - val_acc: 0.9887
Epoch 4/50
54000/54000 [==============================] - 30s - loss: 0.0929 - acc: 0.9757 - val_loss: 0.0390 - val_acc: 0.9890
Epoch 5/50
54000/54000 [==============================] - 30s - loss: 0.0829 - acc: 0.9788 - val_loss: 0.0329 - val_acc: 0.9920
Epoch 6/50
54000/54000 [==============================] - 30s - loss: 0.0760 - acc: 0.9807 - val_loss: 0.0315 - val_acc: 0.9917
Epoch 7/50
54000/54000 [==============================] - 30s - loss: 0.0740 - acc: 0.9824 - val_loss: 0.0310 - val_acc: 0.9917
Epoch 8/50
54000/54000 [==============================] - 30s - loss: 0.0679 - acc: 0.9826 - val_loss: 0.0297 - val_acc: 0.9927
Epoch 9/50
54000/54000 [==============================] - 30s - loss: 0.0663 - acc: 0.9834 - val_loss: 0.0300 - val_acc: 0.9908
Epoch 10/50
54000/54000 [==============================] - 30s - loss: 0.0658 - acc: 0.9833 - val_loss: 0.0281 - val_acc: 0.9923
Epoch 11/50
54000/54000 [==============================] - 30s - loss: 0.0600 - acc: 0.9844 - val_loss: 0.0272 - val_acc: 0.9930
Epoch 12/50
54000/54000 [==============================] - 30s - loss: 0.0563 - acc: 0.9857 - val_loss: 0.0250 - val_acc: 0.9923
Epoch 13/50
54000/54000 [==============================] - 30s - loss: 0.0530 - acc: 0.9862 - val_loss: 0.0266 - val_acc: 0.9925
Epoch 14/50
54000/54000 [==============================] - 31s - loss: 0.0517 - acc: 0.9865 - val_loss: 0.0263 - val_acc: 0.9923
Epoch 15/50
54000/54000 [==============================] - 30s - loss: 0.0510 - acc: 0.9867 - val_loss: 0.0261 - val_acc: 0.9940
Epoch 16/50
54000/54000 [==============================] - 30s - loss: 0.0501 - acc: 0.9871 - val_loss: 0.0238 - val_acc: 0.9937
Epoch 17/50
54000/54000 [==============================] - 30s - loss: 0.0495 - acc: 0.9870 - val_loss: 0.0246 - val_acc: 0.9923
Epoch 18/50
54000/54000 [==============================] - 31s - loss: 0.0463 - acc: 0.9877 - val_loss: 0.0271 - val_acc: 0.9933
Epoch 19/50
54000/54000 [==============================] - 30s - loss: 0.0472 - acc: 0.9877 - val_loss: 0.0239 - val_acc: 0.9935
Epoch 20/50
54000/54000 [==============================] - 30s - loss: 0.0446 - acc: 0.9885 - val_loss: 0.0226 - val_acc: 0.9942
Epoch 21/50
54000/54000 [==============================] - 30s - loss: 0.0435 - acc: 0.9890 - val_loss: 0.0218 - val_acc: 0.9947
Epoch 22/50
54000/54000 [==============================] - 30s - loss: 0.0432 - acc: 0.9889 - val_loss: 0.0244 - val_acc: 0.9928
Epoch 23/50
54000/54000 [==============================] - 30s - loss: 0.0419 - acc: 0.9893 - val_loss: 0.0245 - val_acc: 0.9943
Epoch 24/50
54000/54000 [==============================] - 30s - loss: 0.0423 - acc: 0.9890 - val_loss: 0.0231 - val_acc: 0.9933
Epoch 25/50
54000/54000 [==============================] - 30s - loss: 0.0400 - acc: 0.9894 - val_loss: 0.0213 - val_acc: 0.9938
Epoch 26/50
54000/54000 [==============================] - 30s - loss: 0.0384 - acc: 0.9899 - val_loss: 0.0226 - val_acc: 0.9943
Epoch 27/50
54000/54000 [==============================] - 30s - loss: 0.0398 - acc: 0.9899 - val_loss: 0.0217 - val_acc: 0.9945
Epoch 28/50
54000/54000 [==============================] - 30s - loss: 0.0383 - acc: 0.9902 - val_loss: 0.0223 - val_acc: 0.9940
Epoch 29/50
54000/54000 [==============================] - 31s - loss: 0.0382 - acc: 0.9898 - val_loss: 0.0229 - val_acc: 0.9942
Epoch 30/50
54000/54000 [==============================] - 31s - loss: 0.0379 - acc: 0.9900 - val_loss: 0.0225 - val_acc: 0.9950
Epoch 31/50
54000/54000 [==============================] - 30s - loss: 0.0359 - acc: 0.9906 - val_loss: 0.0228 - val_acc: 0.9943
10000/10000 [==============================] - 2s





[0.017431972888592554, 0.99470000000000003]

Our updated model achieves an accuracy of 99.47%99.47% on the test set, a significant improvement from the baseline performance of 99.12%99.12%. Of course, for such a small and (comparatively) simple problem such as MNIST, the gains might not immediately seem that important. Applying these techniques to problems like CIFAR-10 (provided you have sufficient resources) can yield much more tangible benefits.

I invite you to work on this model even further: specifically, make advantage of the validation data to do more than just early stopping: use it to evaluate various kernel sizes/counts, hidden layer sizes, optimisation strategies, activation functions, number of networks in the ensemble, etc. and see how this makes you compare to the best of the best (at the time of writing this post, the top-ranked model achieves an accuracy of 99.79%99.79% on the MNIST test set).

Conclusion

Throughout this post we have covered six methods that can help further fine-tune the kinds of deep neural networks discussed in the previous two tutorials:
L2L2 regularisation
– Initialisation
– Batch normalisation
– Data augmentation
– Ensemble methods
– Early stopping

and successfully applied them to a baseline deep CNN model within Keras, achieving a significant improvement on MNIST, all in under 90 lines of code.

This is also the final topic of the series. I hope that what you’ve learnt here is enough to provide you with an initial drive that, combined with appropriate targeted resources, should see you become a bleeding-edge deep learning engineer in no time!

If you have any feedback on the series as a whole, wishes for future tutorials, or would just like to say hello, feel free to email me: petar [dot] velickovic [at] cl [dot] cam [dot] ac [dot] uk. Hopefully, there will be more tutorials to come from me on this subject, perhaps focusing on topics such as recurrent neural networks or deep reinforcement learning.

Thank you!

ABOUT THE AUTHOR

Petar Veličković

Petar is currently a Research Assistant in Computational Biology within the Artificial Intelligence Group of the Cambridge University Computer Laboratory, where he is working on developing machine learning algorithms on complex networks, and their applications to bioinformatics. He is also a PhD student within the group, supervised by Dr Pietro Liò and affiliated with Trinity College. He holds a BA degree in Computer Science from the University of Cambridge, having completed the Computer Science Tripos in 2015.

 

Exploring Deep Learning on Satellite Data

$
0
0

Exploring Deep Learning on Satellite Data

This is a guest post, originally posted at the Fast Forward Labs blog, by Patrick Doupe, now at the Arnhold Institute for Global Health. In this post Patrick describes his Insight project, undertaken in consultation with Fast Forward Labs during the Winter 2016 NYC Data Science session.

Machines are getting better at identifying objects in images. These technologies are used to do more than organize your photos or chat your family and friends with snappy augmented pictures and movies. Some companies are using them to better understand how the world works. Be it by improving forecasts on Chinese economic growth from satellite images of construction sites or estimating deforestation, algorithms and data can help provide useful information about the current and future states of society.

In early 2016, I developed a prototype of a model to predict population from satellite images. This extends existing classification tasks, which ask whether something exists in an image. In my prototype, I ask how much of something not directly visible is in an image? The regression task is difficult; current advice is to turn any regression problem into a classification task. But I wanted to aim higher. After all, satellite images appear different across populated and non populated areas.

Populated Region
Empty region

The prototype was developed in conjuction with Fast Forward Labs, as my project in the Insight Data Science program. I trained convolutional neural networks on LANDSAT satellite imagery to predict Census population estimates. I also learned all of this, from understanding what a convolutional neural network is, to dealing with satellite images to building a website with four weeks of support and mentorship from Fast Forward Labs and Insight Data Science. If I can do this within a few weeks, your data scientists too can take your project from idea to prototype in a short amount of time.

LANDSAT-landstats

Counting people is an important task. We need to know where people are to provide government services like health care and to develop infrastructure like school buildings. There are also constitutional reasons for a Census,which I’ll leave to Sam Seaborn.

We typically get this information from a Census or other government surveys like the American Community Survey. These are not perfect measures. For example, the inaccuracies are biased against those who are likely to use government services.

If we could develop a model that could estimate the population well at the community level, we could help government services better target those in need. The model could also help governments that facing resources constraints that prevent the running of a census. Also, if it works for counting humans, then maybe it could work for estimating other socio-economic statistics. Maybe even help provide universal internet access. So much promise!

So much reality

Satellite images are huge. To keep the project manageable I chose two US States that are similar in their environmental and human landscape; one State for model training and another for model testing. Oregon and Washington seemed to fit the bill. Since these states were chosen based on their similarity, I thought I would stretch the model by choosing a very different state as a tougher test. I’m from Victoria, Australia, so I chose this glorious region.

Satellite images are also messy and full of interference. To minimize this issue and focus on the model, I chose the LANDSAT Top Of Atmosphere (TOA) annual composite satellite image for 2010. This image is already stitched together from satellite images with minimal interference. I obtained the satellite images from the Google Earth Engine. I began with low resolution images (1km) and lowered my resolution in each iteration of the model.

For the Census estimates, I wanted the highest spatial resolution, which is the Census block. A typical Census block contains between 600 and 3000 people, or about a city block. To combine these datasets I assigned each pixel its geographic coordinates and merged each pixel to its census population estimates using various Python geospatial tools. This took enough time that I dropped the bigger plans. Best get something complete than a half baked idea.

A very high level overview of training Convolutional Neural Networks

The problem I faced is a classic supervised learning problem: train a model on satellite images to predict census data. Then I could use standard methods, like linear regression or neural networks. For every pixel there is number corresponding to the intensity of various light bandwidths. We then have the number of features equal to the number of bandwidths by the number of pixels. Sure, we could do some more complicated feature engineering but the basic idea could work, right?

Not really. You see, a satellite image is not a collection of independent pixels. Each pixel is connected to other pixels and this connection has meaning. A mountain range is connected across pixels and human built infrastructure is connected across pixels. We want to retain this information. Instead of modeling pixels independently, we need to model pixels in connection with their neighbors.

Convolutional neural networks (hereafter, “convnets”) do exactly this. These networks are super powerful at image classification, with many models reporting better accuracy than humans. For the problem of estimating population numbers, we can swap the loss function and run a regression.

Diagram of a simple convolutional neural network processing an input image. From Fast Forward Labs report on Deep Learning: Image Analysis

Training the model

Unfortunately convnets can be hard to train. First, there are a lot of parameters to set in a convnet: how many convolutional layers? Max-pooling or average-pooling? How do I initialize my weights? Which activations? It’s super easy to get overwhelmed. My contact at Fast Forward Labs suggested using VGGNet as a starting base for a model. For other parameters, I based the network on what seemed to be the current best practices. I learned these by following this winter’s convnet course at Stanford.

Second, convnets take a lot of time and data to train, we’re talking weeks here. I want fast results for a prototype. One option is to use pre-trained models, like those available at the Caffe model zoo. I was writing my model using the Keras python library, which at present doesn’t have as large a zoo of models. Instead, I chose to use a smaller model and see if the results pointed in a promising direction.

Results

To validate the model, I used data from on Washington, USA and Victoria, Australia. I show the model’s accuracy on the following scatter plot of the model’s predictions against reality. The unit of observation is the small image-observation used by the network and I estimate the population density in an image. Since each image size is roughly the same area (complication: the earth is round), this is the same as estimating population. Last, the data is quasi log-normalized.

Phew. Let’s start with Washington:

We see that the model is picking up the signal. Higher actual population densities are associated with higher model predictions. Also noticeable is that the model struggles to estimate regions of zero population density. The R^2 of the model is 0.74. That is, the model explains about 74 percent of the spatial variation in population. This is up from 26 percent in the four weeks in Insight.

A harder test is a region like Victoria with a different natural and built environment. The scatter plot of model performance shows the reduced performance. The model’s inability to pick regions of low population is more apparent here. Not only does the model struggle with areas of zero population, it predicts higher population for low population areas. Nevertheless, with an R^2 of 0.63, the overall fit is good for a harder test.

An interesting outcome is that the regression estimates are quite similar for both Washington and Victoria: the model consistently underestimates reality. In sample, we also have a model that underestimates population. Given that the images are unlikely to have enough information to identify human settlements at current resolution, it’s understandable that the model struggles to estimate population in these regions.

Model results

Conclusion

LANDSAT-landstats was an experiment to see if convnets could estimate objects they couldn’t ‘see.’ Given project complexity, the timeframe, and my limited understanding of the algorithms at the outset, the results are promising. We’re not at a stage to provide precise estimates of a region’s population, but with improved image resolution and advances in our understanding of convnets, we may not be far away.

Interested in transitioning to a career in data science? Find out more about the Insight Data Science Fellows Program in New York and Silicon Valley, applytoday, or sign up for program updates.

Already a data scientist or engineer? Find out more about our Advanced Workshops for Data Professionals. Register for two-day workshops in Apache Spark and Data Visualization, or sign up for workshop updates.


This startup uses machine learning and satellite imagery to predict crop yields

$
0
0

This startup uses machine learning and satellite imagery to predict crop yields

Artificial intelligence + nanosatellites + corn


Cloudless: Open Source Deep Learning Pipeline for Orbital Satellite Data

$
0
0

Cloudless: Open Source Deep Learning Pipeline for Orbital Satellite Data

Introduction

I’m proud to announce the 1.0 release of Cloudless, an open source computer vision pipeline for orbital satellite data, powered by data from Planet Labs and using deep learning under the covers. This blog post contains details and a technical report on the project.

Note: this article uses Stretchtext, which are dashed and bordered boxes like this one and which can optionally be clicked and opened inline to learn more details about a subject.

The Cloudless project was born during Dropbox’s week-long Hack Week in August 2015 with contributions from Johann Hauswald, Max Nova, and myself; I’ve continued working on the project post-Hack Week to bring it to a 1.0 release, generally during the weekly colearning groups I run.

What is Planet Labs?


Figure 1 Two Planet Labs “Doves” in orbit

Planet Labs is a startup company that has launched fleets of nanosats, each about the size of a shoebox, to image much of the earth daily. They’ve launched about eighty of their nanosats, called Doves, into Low Earth Orbit, via both the International Space Station (ISS) and as secondary-payloads. Doves are built from commercially available hardware and can be placed into orbit quickly, resulting in much lower prices per satellite and allowing much more rapid iteration in what is known as “agile aerospace.” Planet Labs approach is in contrast to traditional geospatial imaging satellites, which cost billions of dollars, take five to ten years to develop, and are launched into geosynchronous orbit.


Figure 2 Two Planet Labs Doves being launched from the ISS

Motivation

Once Planet Labs full fleet is deployed 90% of the earth’s surface will be imaged once daily. This will result in a flood of visual data that is greater than any person can efficiently process. Can we have computers do the detection and localization of items in the orbital satellite data instead?

Interesting applications of automated visual satellite detection are counting the numbers of cars in Walmart parking lots as a correlate of daily consumer demand or detecting deforestation from illegal logging operations. The Planet Labs resolution is currently three through five meters, which is not great enough for counting cars, however, and publicly available forestry deforestation data sets are not suitable yet.

For this reason we focused on automated cloud detection, hence the name Cloudless (which is also a pun for the fact that the project originated during Dropbox’s Hack Week, a cloud-based company — who says deep learning can’t be funny).

Detecting clouds in orbital satellite imagery to ignore or eliminate them is an important pre-processing step to doing interesting work with nanosat imagery as we want to detect changes over time, such as cars appearing, forests disappearing, etc. Being able to first detect and eliminate clouds (which change often and could lead to false positives), is therefore important.

Note that even though the Cloudless pipeline is currently focused on cloud detection and localization, the entire pipeline focuses on what would be needed for any visual orbital detection and localization task and with tweaking can be applied to other problems.

Related Work

Cloud detection and elimination are by no means solved problems. However, there are currently two broad non-deep learning approaches for cloud detection and elimination that help with the problem.

The first involves detailed, hand-rolled feature extraction pipelines that attempt to extract relevant pixel-level details from infrared, thermal, multispectral bands, etc. and then use various thresholds that are sometimes location, day, and season-specific. They are complicated and not necessarily universal. In addition, the Planet Labs satellites only currently work in the visual spectrum (except for a few satellites gained through a recent acquisition named RapidEye, which can detect in the near infrared), which means the multispectral and thermal bands some of these techniques depend on are not present. A good survey of these hand-engineered approaches is in [1].

The second approach involves taking a stack of satellite imagery of the same location over many days and “averaging” them together. This will by definition remove the clouds from the image, leaving consistent details such as cities, roads, surface details, etc. behind. Unfortunately, it also removes much of the “changes over time” information, stripping away what makes Planet Labs imagery so compelling. A good example of the stacking approach is in an open source library from Planet Labs themselves named plcompositor [2].

Approach & Results

Convolutional neural nets (CNNs) have shown surprisingly strong results in image classification tasks in recent years and seem to be a natural fit for this problem. Once trained, a CNN can be used to do object localization to draw bounding boxes around candidate detected items. However, supervised training of CNNs depends on large amounts of training data, which does not readily exist for orbital satellite data, requiring us to bootstrap it ourselves. Cloudless therefore consists of three pieces:

  • An annotation tool that takes data from the Planet Labs API and allows users to draw bounding boxes around clouds to bootstrap training data.
  • A training pipeline that takes annotated data, runs it on EC2 on GPU boxes to fine tune a neural network model using Caffe, and then generates validation statistics to relate how well the trained model performs.
  • A bounding box system that takes the trained cloud classifier and attempts to draw bounding boxes on orbital satellite data, in this case clouds.

The annotation tool is fairly straightforward; it draws imagery from the Planet Labs API, normalizes and pre-processes it, and then presents it to a user through a web browser to draw bounding boxes over candidate objects:


Figure 5 Animated GIF showing annotation tool in action, drawing bounding boxes around clouds

The training portion takes the annotated images, chops them into pieces representing images that either have clouds or not, splits them into 80% training and 20% validation sets, and trains a binary classifier on Caffe. The model itself is a fine-tuned version of AlexNet. The open source project includes scripts to train these on GPU-instances on Amazon EC2 to explore different hyperparameter settings in parallel quickly and to download and query the trained results to understand how well training went on the validation data; see two examples graphs generated from the training pipeline

Finally, the trained classifier is fed into a Python-based implementation of Selective Search [3][4] in order to draw candidate bounding boxes around clouds. As part of the work on this project the Python Selective Search library was back ported from Python 3 to Python 2.7 to be compatible with Caffe and the rest of the Python-2.7 based Cloudless pipeline. Here are example before and after output showing detected clouds on the final trained model (detailed more below in the Trained Model section); detected clouds are overlaid with yellow boxes below:


Figure 6 Satellite image before bounding boxes shown

Figure 7 Satellite image with detected bounding boxes from Cloudless overlaid in yellow

The bounding box system also outputs a JSON file that gives detected cloud coordinates for later-use by possible downstream computer vision consumers.

Trained Model

It took quite a number of iterations to get a model with decent results. The major approaches are detailed in Table 1, with the final best approach bolded:


Table 1 Performance results for different model setups, detailed later

All iterations were based on a BVLC AlexNet Caffe Zoo trained model [6], with fine tuning done on Cloudless annotation data. Convolutional layers were frozen during fine tuning, with training done only on the final fully connected layer. The AlexNet ImageNet output softmax was reduced from one-thousand classes to two, indicating whether a cloud is present in a given image or not. All iterations had a standard 80/20 split between training and hold-out validation data. The number of training epochs for most runs converged fairly quickly, as you can see in the graph below. Finally, all iterations used location imagery restricted to the San Francisco Bay Area as we were restricted via the Planet Labs API to the state of California.


Figure 8 Training converged very rapidly; this shows accuracy converging after a few hundred iterations and then stabilizing for the rest of the 20,000 iterations.

700 images from the Planet Labs API were hand-labelled via the annotation tool and trained; the final accuracy was fairly low, only 62.5%. This was discovered to not necessarily be from small amounts of data, but rather from fairly extensive “motion blur” affecting some satellite imagery. An example is shown below:


Figure 9 Motion blur in Planet Labs imagery

Despite this, the bounding boxes generated were still in the realm of plausibility, though with some false positives caused by the motion blur. An example from the earlier image:


Figure 10 Planet Labs motion blur image with detected cloud bounding boxes in yellow

Getting to a better accuracy involved three things:

1) About two months after Cloudless was started Planet Labs purchased a fleet of satellites named RapidEye. The data from these are clearer than the Planet Labs satellites currently are. An example image:


Figure 11 Example RapidEye imagery showing clearer resolution without motion blur

2) The same location but over 65 days were annotated and fed into the network, allowing the network to ‘learn’ what is constant in a location and what changes. Here’s four days of the larger-format imagery that fed into the annotation tool as an example:


Figure 12 A single day from the San Francisco Bay Area

Figure 13 A single day from the San Francisco Bay Area

Figure 14 A single day from the San Francisco Bay Area

Figure 15 A single day from the San Francisco Bay Area

In affect the network was getting examples of what the San Francisco Bay Area looks like when it’s clear and when it’s cloudy, allowing it to generalize from this.

3) More images were hand-labelled with the annotation tool, totalling about 4000 images.

These three steps added up to a much larger final accuracy of 89.69% on the validation data, with much stronger precision and recall than the earlier run. While its difficult to say exactly how the hand-engineered non-deep learning cloud detection pipelines detailed in Related Work above and in [1] perform, as its fairly dependent on location and hand-chosen threshold, general performance accuracy is reported to be in the 80% to 90% range, making Cloudless generally competitive with them. Unlike those other solutions, however, Cloudless could be fed with multi-class annotated data and trained for a wide variety of tasks without a hand-rolled feature engineering pipeline focused just on clouds.

For the final best solution see the final confusion matrix and details on the raw data fed into neural network training pipeline

Experiments were done to attempt to get beyond the 89.69% accuracy via greater data augmentation. All iterations used Caffe’s built-in clipping and mirroring transformations during training; experiments were done however with manually rotating all labelled images via data preparation 90 degrees in four directions to see if this aided training. Counterintuitively, though, performance was actually lower, as detailed in table 1 above.

Limitations

Since Cloudless uses only visible spectrum imagery, it does not work with night-time images. In addition, Cloudless has not been trained or tested on regions with snow, which has been reported to cause issues with other cloud detection schemes.

In addition, the Selective Search bounding box scheme chosen is fairly computationally intensive. On my Macbook Pro laptop with a 2.5 GHz Intel Core i7 and an NVIDIA GPU, for example, processing a single image took about two minutes. This is not terrible for Planet Labs, as a given location would only be imaged once a day and this task can be parallelized fairly easily.

Unfortunately, generating the bounding boxes depends on a number of hyperparameters for Selective Search that are not always generic across input images and requires some hand tuning for a given location; it’s not always “hands free” yet. Example hyperparameter values are given on the Github Cloudless page. This motivates some of the ideas in the “Future Work” section below.

Future Work

A good future step is to eliminate Selective Search’s computational slowness and hand-crafted hyperparameter tuning from the equation. One possibility is to turn the convolutional neural network currently in Cloudless into a deconvolution neural network [7], feeding in a raw image and having the output be a pixel mask where each value is the class of that pixel, whether its a cloud or some other desired classification and localization. This might allow the network itself to learn what hyperparameter “thresholds” are appropriate for different input images, eliminating the kind of hand tuning the Selective Search bounding boxes currently require, as well as providing pixel-level classification.

To increase accuracy in general, it‘s clear from non-deep learning cloud detection techniques that non-visible spectral bands can be important for inferring the presence of clouds or other items. When a user annotates a satellite image in the annotation tool in the visual spectrum, other side-channel non-visual bands can be saved and also fed into the network, such as infrared, thermal, etc. As Planet Labs integrates greater spectral bands into their satellites this will become more of an option.

In addition, we should be able to feed in the latitude, longitude, altitude, day and time of the year, and the position of the sun of each pixel as input into the network. This is clearly information that a human themselves would use to identify what something is. For example, if you were told that an image came from the North Pole, you would probably classify a white patch as snow if you were unsure, while if you were told it came from the equator in a tropical forest you‘d assign a very high probability to a blury white patch as being a cloud. If you weren’t sure what a white patch was while looking at an orbital image of NYC, you’d probably have different answers if you knew it was winter or summer. It seems only natural to give these same features to a neural network to help it decide when its dealing with “boundary” cases where its not quite sure; having this information would allow it to essentially take a Bayesian approach to figuring out what it is looking at to make an informed guess.

Finally, the annotation tool itself can be extended to run on Mechanical Turk to bootstrap even more data, which would probably be necessary if the deconvolution approach earlier is taken as it has more free parameters to train. If this is done, the annotation tool will have to be extended with a training step to gauge how well labellers do as well as a validation step to run labelled data by other groups of users to gauge their quality, as in [8]. Once done, and once Planet Labs has satellite data with a greater resolution, this can be used to bootstrap multi-class data sets with examples of cars, roads, power lines, ocean tankers, buildings, different biomes, etc.

Conclusion

In this blog post I’ve introduced Cloudless, a three-part open source orbital satellite pipeline that provides annotation tools, a deep learning component, and a bounding box system. It is currently focused on cloud detection and localization, but could be extended for other non-cloud tasks in the future. The final trained model has an accuracy of 89.69% on cloud detection, generally competitive with hand-tuned non-deep learning cloud detection pipelines.

One strong aspect of deep learning is that future proposed enhancements, such as those detailed in “Future Work” above, should help with end-to-end learning of most computer vision oriented orbital satellite data tasks, not just cloud detection tasks. Rather than just increase cloud detection such as with other non-deep learning oriented approaches that use manual feature engineering, investments in deep learning can help improve a wide range of computer vision tasks beyond just the problem at hand.

The GitHub page for Cloudless can be found here.


Special thanks to Johann Hauswald and Max Nova for their help on Cloudless during Hack Week, to Planet Labs for graciously giving us access to their data and API, and for Dropbox for making space for Hack Week and Cloudless.

Bibliography

[1] Gary Jedlovec, “Automated Detection of Clouds in Satellite Imagery”, NASA Marshall Space Flight Center

[2] Frank Warmerdam, Pixel Lapse Compositor (plcompositor)

[3] J. R. R. Uijlings et al., Selective Search for Object Recognition, IJCV, 2013

[4] Koen van de Sande et al., Segmentation As Selective Search for Object Recognition, ICCV, 2011

[5] Brad Neuberg, fork of Selective Search

[6] Caffe Model Zoo

[7] Hyeonwoo Noh et al., Learning Deconvolution Network for Semantic Segmentation

[8] Hao Su et al., Crowdsourcing Annotations for Visual Object Detection

Subscribe to my RSS feed and follow me on Twitter to stay up to date on new posts.

Please note that this is my personal blog — the views expressed on these pages are mine alone and not those of my employer.

BACK TO CODINGINPARADISE.ORG

 



Introducing the PiAQ

$
0
0

Recently, we’ve gotten a lot of buzz for a project we’re really excited about here in the labs – our PiAQ. It’s something we’re glad to see get attention, because we think it’s going to be a valuable tool to increase the quality of life for a lot of people.

The PiAQ is an indoor air quality sensor built to fit on a $35 credit-card sized computer called a Raspberry Pi. Raspberry Pis are great for educational programs or for do-it-yourself projects like our sensor. The PiAQ uses that small computer to power air quality monitors, as well as run software (like our Rosetta Home, a home-monitoring software in development here in the lab) that can interpret and visualize what those monitors are tracking.

The Raspberry Pi provides a great platform for us to build upon, as it’s a trusted low-cost alternative to many of the commercial devices that exist in the Internet of Things space. Since the Pi is compatible with a variety of devices and can run many different types software, it’s the perfect candidate to start building up your smart home. There have been several Pi-based smart home DIY kits, including some with environmental sensors.

The goal for this project is to make information about the air people breathe more accessible. While the prevailing thought has been that outdoor air quality – especially in cities – is worse than indoor, did you know that the reverse can actually be true? In fact, the EPA estimates that indoor air quality can be two-to-five times worse than outdoors in some places, which is especially troubling considering we spent most of our time indoors.

As well as helping people monitor their own indoor air quality, we also built the PiAQ as a launching point for a larger project we’ve been working on: Rosetta Home 2.0. Rosetta Home is a whole-home automation system that will include indoor air quality sensing. By building on top of the Pi, we not only are able to use the Pi’s existing hardware and software, but we also bring the project forward into the Pi community at large. By opening the project up to this community, we get one of the largest groups of “beta-testers” possible; people who are passionate about technology and interested in helping us make the best sensor we can. To date, there have been over 10 million Raspberry Pi units sold.

At CRT Labs, we’re excited about what open-source hardware and software can do for emerging technologies. Currently, you can view our GitHub repository for the PiAQ (and our other projects) and download our hardware schematics and as well as our software. You can build your own PiAQ, or modify the software to your needs. It also allows you to help us debug the code, or find any flaws in our hardware. The more eyes we have on our projects, the quicker we can iterate them.

rosettahome3

It’s great that we have this community open to us, not just to create this product, but to allow us to use what we learn from the PiAQ to expand our indoor air quality sensing even further. We’re working on developing stand-alone sensors that can be networked together in order to give you a sense of the indoor air quality of your whole home. That stat above, where the EPA says your home’s air can be up to five times worse than outdoors, can affect your daily life. For example, NAR’s CTO was curious about his home’s air quality when his wife complained of frequent headaches. He brought home an indoor air quality (IAQ) sensor, and found out his home’s CO2 was above recommended levels. To counter this, he started opening the windows at night and running his whole-house fan and quickly after, his family’s headaches disappeared.

Our IAQs will measure not only CO2, but temperature, humidity, barometric pressure, light and sound intensity, volatile organic compounds (VOCs), CO, and NO2. Temperature, humidity, barometric pressure, and light and sound intensity all contribute to your home’s comfort levels. When it’s too humid, you know to run a dehumidifier; if your baby can’t sleep, you can check the sound levels to see if maybe the party next door is louder than you thought. CO, CO2, and NO2 can actually cause short and long-term health effects. In the short-term, these pollutants can cause headaches, drowsiness, sinus issues, and light-headedness; in the long-term, they have serious consequences, especially when exposure lasts for hours at a time. If you know about what levels these gases occur within your home, you’re able to start mitigating them, like our CTO did when CO2 reached high levels in his house.

The PiAQ will be available to purchase starting in Q1 2017. We are looking for research partnerships; email us and tell us about your projects, and we can work together to see how the PiAQ can fit your needs.

The PiAQ is the exciting first step in beginning to create an ecosystem where the home’s health is monitored just like we monitor our own fitness. The FitBit got people talking about their own health – we now all know that 10,000 steps is a good goal to maintain our body’s fitness. Our goal is to get people talking about the home in the same way. We think about what we do in the lab in terms of our REALTOR® members’ code of ethics: “Under all is the land. Upon its wise utilization and widely allocated ownership depend the survival and growth of free institutions and of our civilization.” We are striving to change how people think about their homes, and by making the home’s health a priority, we can help to positively impact everyone’s lives.


39 Open Source Swift UI Libraries For iOS App Development

$
0
0

39 Open Source Swift UI Libraries For iOS App Development

This is “amazing” series of open source projects.

Developed by Apple Inc, Swift is currently the most popular programming language on Github and it has one of the most active communities that kindly contribute their open source projects.

Open source libraries can be sweet and they can make your life dramatically easier in building your iOS apps. For those iOS folks spending hours and days hunting for good libraries, you may find this post useful.

Mybridge AI evaluates the quality of content and ranks the best articles for professionals. In this observation we’ve compared nearly 2,700 open source Swift UI libraries to select the Top 39. With only 1.4% chance to be included in the list, the average number of Github stars was 2,527.

This is specific to Swift “UI” (User Interface) libraries —broken down into 12 groups: Animation, Popup, Feed, Onboarding, Color, Image, Graph, Icon, Form, Layout, Message, Search.

If you’re looking for open source Swift “Apps”, follow this link.

<Animation UI>

No 1

Spring: A library to simplify iOS animations in Swift. [9164 stars on Github].


No 2

Material: An animation and graphics framework that is used to create beautiful applications [6120 stars on Github].


No 3

RazzleDazzle: A simple keyframe-based animation framework for iOS, written in Swift. Perfect for scrolling app intros [2291 stars on Github].


No 4

Stellar: A fantastic Physical animation library for swift [1881 stars on Github].


No 5

Macaw: Powerful and easy-to-use vector graphics Swift library with SVG support [594 stars on Github].

<Transition UI>

No 6

PagingMenuController: Paging view controller with customizable menu in Swift [1305 stars on Github].


No 7

PreviewTransition: A simple preview gallery controller [1025 stars on Github].


No 8

PinterestSwift: Transition like Pinterest in Swift [1007 stars on Github].


No 9

Youtube iOS app written in swift 3 [786 stars on Github].


No 10

Twicket Segmented Control: Custom UISegmentedControl replacement for iOS, written in Swift [680 stars on Github].

<Pop up UI>

No 11

SCLAlertView-Swift: Beautiful animated Alert View written in Swift [3056 stars on Github].


No 12

SwiftMessages: Very flexible alert messages written in Swift. [1356 stars on Github].


No 13

XLActionController: Fully customizable and extensible action sheet controller written in Swift 3 [1346 stars on Github].


No 14

Popover: Balloon pop up library like Facebook app, written in pure swift. [852 stars on Github].


No 15

Presentr: Wrapper for custom ViewController presentations [635 stars on Github].

<Feed UI>

No 16

FoldingCell: An expanding content cell inspired by folding paper material [4285 stars on Github].


No 17

ExpandingCollection: A card peek/pop controller [2425 stars on Github].


No 18

DGElasticPullToRefresh: Elastic pull to refresh compontent written in Swift [2308 stars on Github].


No 19

Persei: Animated top menu for UITableView / UICollectionView / UIScrollView written in Swift [2269 stars on Github].


No 20

SCLAlertView-Swift: Beautiful animated Alert View written in Swift by Instagram Engineering. [2443 stars on Github].


No 21

PullToMakeSoup: Custom animated pull-to-refresh that can be easily added to UIScrollView [1301 stars on Github].

<Onboarding UI>

No 22

DZNEmptyDataSet: Empty State UI Library [6552 stars on Github].


No 23

Instructions: Create walkthroughs and guided tours in Swift. [2256 stars on Github].


No 24

Presentation: Make tutorials, release notes and animated pages [1680 stars on Github].

<Color UI>

No 25

Chameleon: Flat Color Framework for Swift Developers [7071 stars on Github].


No 26

Hue: All-in-one coloring utility that you’ll ever need to write in Swift [1612 stars on Github].


No 27

DynamicColor: Yet another extension to manipulate colors easily in Swift [1310 stars on Github].

<Image UI>

No 28

FaceAware: An extension that gives UIImageView the ability to focus on faces within an image when using AspectFill [1424 stars on Github].


No 29

ComplimentaryGradientView: Create complementary gradients generated from dominant and prominent colors in supplied image [384 stars on Github].

<Graph UI>

No 30

Charts: Beautiful charts for iOS built in Swift [11433 stars on Github].


No 31

aper-switch: A Swift module which paints over the parent view when the switch is turned on. [3065 stars on Github].

<Icon UI>

No 32

Paper Switch: RAMPaperSwitch is a Swift module which paints over the parent view when the switch is turned on. [1849 stars on Github].


No 33

Circle-menu: CircleMenu is a simple, elegant menu with a circular layout [1768 stars on Github].

<Schedule UI>

No 34

JTAppleCalendar: The Unofficial Swift Apple Calendar Library. View. Control. for iOS & tvOS [1026 stars on Github].


No 35

DateTimePicker: A nicer iOS UI component for picking date and time [455 stars on Github].

<Form UI>

No 36

Eureka: Elegant iOS form builder in Swift [4117 stars on Github].

<Layout UI>

No 37

Neon: A powerful Swift programmatic UI layout framework for iPhone & iPad [3439 stars on Github].

<Message UI>

No 38

NMessenger: A fast, lightweight messenger component built on AsyncDisplaykit and written in Swift [1492 stars on Github].

<Search UI>

No 39

Reel-search: A search controller that allows you to choose options from a list [1364 stars on Github].

<Resources>

No 1) Learn

The Complete iOS 10 Developer Course: Build 21 Apps including Uber, Instagram & Tinder.

[22,575 recommends, 4.7/5 star]

No 2) Interview

Software Engineer Interview Unleashed: Learn from a former Google interviewer.

[210 recommends, 4.8/5 rating]

No 3) Hosting

For those who looking to host a website under 5 minutes

[One of the cheapest]


技术周刊 Vol.10 – React Native丨Learn Once, Write Anywhere

$
0
0

结束了前两期的入门( Vol.8 – React,“5 分钟快速入门”)和进阶(Vol.9 – 进阶吧!React),为期一个月的 React 学习快要完成了。接下来,我们进入学习的最后一阶段 – React Native。

本期周刊重点学习 React Native,从上手到项目实践,希望本期的内容,可以让你对 React 的整体结构,达到一个全局的了解。

React Native 上手

上手一种新的技术,官方的文档 自然是最详实不过了。然而,很多时候看完官方文档,我们仍旧会在自己用的时候掉进各种各样的坑里,所以,我们精选下面这几篇文章,让你在上手 React Native 的同时尽量避免进坑。

ChanceKing – React Native 初构建之我等到花儿都谢了

喜欢 React Native,因为它改变了前端给大家的传统认知,拓展了前端的维度;因为它不仅能在 H5 的范畴里搞一搞,也可以侵占到客户端里翻云覆雨,因为它提高了前端的存在感,让人有所期盼和兴奋。本文作者将自己第一次构建 React Native 项目所踩的坑记录一下,如果你也准备上手 React Native,不妨一起跟着试一下。

听海 JamiE – React Native 基础练习指北(一)React Native 基础练习指北(二)

React Native 是如何开发 iOS APP?如果你也好奇,那就赶快准备好 Mac OSX, XCode, node 以及 npm,在终端输入 npm install -g react-native-clireact-native init AwesomeProject,从展示一张海报开始,聊聊模拟数据、渲染,通过接口获取线上数据并展示等环节。

陈学家_6174 – React-Native 之布局篇

宽度单位和像素密度、flex 布局、图片布局、绝对定位和相对定位、文本元素……详细的讲解方式,简洁的特征总结,帮你轻松搞定 React-Native 布局。

陈学家_6174 – React-Native 与 React-Web 的融合

对于 React-Native 在实际中的应用,Facebook 官方的说法是 React-Native 是为多平台提供共同的开发方式,而不是说一份代码,多处使用。为此,作者也尝试通过一个实际的例子(以 SampleApp 做一个简单 demo)探究一下共享代码的可行性。

cnsnake11 – React Native 增量升级方案

当修改了代码或者图片的时候,只要 app 使用新的 bundle 文件和 assets 文件夹,就完成了一次在线升级。本文将基于以上思路,尝试讲解增量升级的解决方案。

DesGemini – 初窥基于 react-art 库的 React Native SVG

在移动端,考虑到跨平台的需求,加之 web 端的技术积累,react-art 成为了现成绘制图形的解决方案,且添加了 iOS 和 Android 平台上对 react-art 的支持,在此,作者为诸位带来了(全球首发?=_=)入门文档。

静逸秋水 – React Native 开发小 Tips

相信好多写 React Native 的都是前端出身,当遇见问题时会习惯性从前端出发,但由于 React Native 本身的限制,并不是支持足够多的属性和样式,故作者结合自己的开发实践,将一些未来开发可能会遇见的问题做了总结,并给出一些小的代码参考,希望能帮助到你。

DesGemini – React Native 蛮荒开发生存指南

React Native 的发展可谓是大红大紫,但其文档更新速度却远远跟不上开发的速度,使得 React Native 的工程化恍若蛮荒生存。作者为某一商业项目开发 React Native App 已近半年,并将自己的踩坑和爬坑经验撰写成文,希望此份指南能为大家带来帮助。

React Native for Android

本章节选自侯医生的「React Native Android 安利」系列,教程将会更多的结合原生的安卓去讲 React Native,项目从搭建 React Native Android 环境开始,层层深入,跟着教程学习,可以熟练掌握 react-native-android 的开发。

1. 搭建 React Native Android 环境

搭建 React-Native 的文章虽然很多,但大多数都是搭建 JS 层面的,没有结合原生 Android 及其开发去讲。本文将简单介绍如何使用 Android Studio 与 React Native 搭建一个 React 的基础环境,并使用其成功的制作出了一个 hello world。

2. 创建简单 RN 应用(以 JS 角度来看 RN)

从成功制作出一个 hello world 之后,我们要探索一下如何利用 React-Native 制作出比 Hello World 复杂一点的界面,顺便一起审视一些 React Native Android 工程的目录结构。

4. RN 中使用 JS 调用 Java 代码

掌握 3. 如何控制原生 Android 的 activity 间跳转,我们将其中学到的原生知识,使用 JS 去调用。这样双剑合璧,便可以更加高效的开发 React-Native 应用啦~

6. React Native 中的 React.js 基础

很多关于 React-Native 的知识,都是有关于样式,底层,环境等知识的,现在我们来学习一下 React.js 的基础,我们的代码,我们创建的组件及其他相关知识。

8. 手机百度 feed 流的实现

经过上述几篇文章的学习,你将基本掌握了 React-Native 样式的书写与布局的方式。这一节,我们将一起来做个动手实践的例子,来模仿一下手机百度的新闻流,巩固一下自己的 React-Native 能力。

项目分享

ctriptech – React Native 实践之携程 Moles 框架

本次分享将通过对 Moles 框架的分享,介绍携程在 React Native 方面的实战干货,希望给大家一些灵感和启发。内容包括三个方面:

  • Moles 框架在 React Native 和我们主 App 的集成中起到了什么作用?
  • Moles 框架是如何打通 Android、iOS、H5、SEO,让我们一套代码跑在多个平台上?
  • Moles 框架的组成以及原理是怎样的?

静逸秋水 – 使用 React Native 制作圆形加载条

进度条的常规制作方法,可以用 canvas 去绘制图,也可以用 SVG 去绘制。如何使用 React Native 写这样的进度条呢?可以跟着作者一起来尝试一下这个进度条的完成方案。

DesGemini – Node.js + React Native 毕设:农业物联网监测系统的开发手记

该物联网监测系统整体上可分为三层:数据库层,服务器层和客户端层。数据库层除了原有的 Oracle 11d 数据库以外,还额外增加了一个 Redis 数据库。服务器层采用 Node.js 的 Express 框架作为客户端的 API 后台。客户端层绝大部分是 React Native 代码,另外采用了 Redux 来统一应用的事件分发和 UI 数据管理了。一起来感受下 react native 自带 buff 吧~

王铁手 – React-Native 初体验 – 使用 JavaScript 来写 iOS app

写过一个 Hybrid App 了,自觉不够,又心血来潮,耍起了 React Native,非常简单的入门,不知初体验的你是否和作者想一块儿去了。

(本期完)


Raspberry Pi Weather Station: Monitoring Humidity, Temperature and Pressure over Internet

$
0
0
Raspberry Pi Weather Station: Monitoring Humidity, Temperature and Pressure over ThingSpeakRaspberry Pi Weather Station: Monitoring Humidity, Temperature and Pressure over ThingSpeak

Humidity, Temperature and Pressure are three basic parameters to build any Weather Station and to measure environmental conditions. We have previously built a mini Weather Station using Arduinoand this time we are extending the weather station with Raspberry Pi. This IoT based Project aims to show the current Humidity, Temperature and Pressure parameters on the LCD as well on the Internet server using Raspberry Pi, which makes it a Raspberry Pi Weather Station. You can install this setup anywhere and can monitor the weather conditions of that place from anywhere in the world over the internet, it will not only show the current data but can also show the past values in the form of Graphs.

 

We have used DHT11 Humidity & temperature sensor for sensing the temperature and BM180 Pressure sensor module for measuring barometric pressure. This Celsius scale Thermometer and percentage scale Humidity meter displays the ambient temperature and humidity through a LCD display and barometric pressure is displayed in millibar or hPa (hectopascal). All this data is sent toThingSpeak server for live monitoring from anywhere in the world over internet. Do check theDemonstration Video and Python Program, given at the end of this tutorial.

Raspberry-pi-weather-station-IoT-project

Working and ThingSpeak Setup:

This IoT based project has four sections. Firstly DHT11 sensor senses the Humidity & Temperature Data and BM180 sensor measures the atmospheric pressure. Secondly Raspberry Pi reads the DHT11 sensor module’s output by using single wire protocol and BM180 pressure sensor’s output by using I2C protocol and extracts both sensors values into a suitable number in percentage (humidity), Celsius scale (temperature), hectoPascal or millibar (pressure). Thirdly, these values are sent to ThingSpeak server by using inbuilt Wi-Fi of Raspberry Pi 3. And finally ThingSpeak analyses the data and shows it in a Graph form. A LCD is also used to display these values locally.

Raspberry-pi-weather-station-block-diagram

ThingSpeak provides very good tool for IoT based projects. By using ThingSpeak site, we can monitor our data and control our system over the Internet, using the Channels and webpages provided by ThingSpeak. ThingSpeak ‘Collects’ the data from the sensors, ‘Analyze and Visualize’ the data and ‘Acts’ by triggering a reaction. We have previously explained about sending data to ThingSpeak in detail, you can check there. Here we are briefly explaining to use ThingSpeak for this Raspberry Pi Weather station.

First you need to create account on ThingSpeak website and create a ‘New channel’ in it. In new channel you have to define some fields for the data you want to monitor, like in this project we will create three fields for Humidity, Temperature and Pressure data.

 

Now click on ‘API keys’ tab and save the Write and Read API keys, here we are only using Write key. You need to Copy this key in ‘key’ variable in the Code.

Raspberry-pi-weather-station-thingspeak-API-key

After it, click on ‘Data Import/Export’ and copy the Update Channel Feed GET Request URL, which is:

https://api.thingspeak.com/update?api_key=30BCDSRQ52AOI3UA&field1=0

Raspberry-pi-weather-station-thingspeak-feed-GET-url

Now we need this ‘Feed Get Request URL’ in our Python code to open “api.thingspeak.com” and then send data using this Feed Request as query string. And Before sending data user needs to enter the temperature, humidity and pressure data in this query String using variables in program, check in the Code at the end this article.

URL = 'https://api.thingspeak.com/update?api_key=%s' % key
finalURL = URL +"&field1=%s&field2=%s"%(humi, temp)+"&field3=%s" %(pressure)

Working of DHT11 is based on single wire serial communication for fetching data from DHT11. Here we have used AdaFruit DHT11 library for interfacing DHT11 with Raspberry Pi. Raspberry Pi here collects the Humidity and temperature data from DHT11 and atmospheric pressure from BMP180 sensor and then sends it to 16×2 LCD and ThingSpeak server. ThingSpeak displays the Data in form of Graph as below:

Raspberry-pi-weather-station-humidity-temperature-pressure-charts

DHT11-humidity-and-temperature-sensor-and-BMP180-pressure-sensorDHT11 Humidity & Temperature Sensor and BMP180 Pressure Sensor

You can learn more about DHT11 Sensor and its Interfacing with Arduino here.

Circuit Diagram:

Raspberry-pi-weather-station-circuit-diagram

 

Raspberry Pi Configuration and Python Program:

We are using Python language here for the Program. Before coding, user needs to configure Raspberry Pi. You can check our previous tutorials for Getting Started with Raspberry Pi and Installing & Configuring Raspbian Jessie OS in Pi.

First off all we need to install Adafruit Python DHT Sensor Library files to run this project on Raspberry Pi. To do this we need to follow given commands:

sudo apt-get install git-core
sudo apt-get update
git clone https://github.com/adafruit/Adafruit_Python_DHT.git
cd Adafruit_Python_DHT
sudo apt-get install build-essential python-dev
sudo python setup.py install

Installing-adafruit-python-DHT11-library-in-raspberry-Pi

After it user needs to enable Raspberry Pi I2C by going into RPi Software Configuration Too:

sudo raspi-config

Then go to ‘Advance Options’, select ‘I2C’ and ‘Enable’ it.

raspberry-pi-software-configuration-tool-raspi-config

Raspberry-pi-weather-station-Enable-I2C-for-BMP180

Programming part of this project plays a very important role to perform all the operations. First of all we include all required libraries, initiaze variables and define pins for LCD and DHT11.

import sys
import RPi.GPIO as GPIO
import os
import Adafruit_DHT
import urllib2
import smbus
import time
from ctypes import c_short

#Register Address
regCall   = 0xAA
... ......
 ..... ...

In def main(): function, below code is used for sending the data to the server and display it over the LCD, continuously in while loop.

def main():

    print 'System Ready...'
    URL = 'https://api.thingspeak.com/update?api_key=%s' % key
    print "Wait...."
    while True:
            (humi, temp)= readDHT()
            (pressure) =readBmp180()

            lcdcmd(0x01)
            lcdstring("Humi#Temp#P(hPa)")
            lcdstring(humi+'%'+"  %sC  %s" %(temp, pressure))
            finalURL = URL +"&field1=%s&field2=%s"%(humi, temp)+"&field3=%s" %(pressure)
            print finalURL
            s=urllib2.urlopen(finalURL);
            print  humi+ " " + temp + " " + pressure
            s.close()
            time.sleep(10)

For LCD, def lcd_init() function is used to initialize LCD in four bit mode, def lcdcmd(ch) function is used for sending command to LCD, def lcddata(ch) function is used for sending data to LCD and def lcdstring(Str) function is used to send data string to LCD. You can check all these functions in Code given afterwards.

Given def readDHT() function is used for reading DHT11 Sensor:

def readDHT():
    humi, temp = Adafruit_DHT.read_retry(Adafruit_DHT.DHT11, DHTpin)
    return (str(int(humi)), str(int(temp)))

def readBmp180 function is used for reading pressure from the BM180 sensor. BM180 sensor can also give temperature but here we have only used it for calculating pressure.

def readBmp180(addr=deviceAdd):
  value = bus.read_i2c_block_data(addr, regCall, 22)  # Read calibration data

  # Convert byte data to word values
  AC1 = convert1(value, 0)
  AC2 = convert1(value, 2)
  AC3 = convert1(value, 4)
  AC4 = convert2(value, 6)
  ..... .......
  ........ ......

So this is the basic Raspberry Pi Weather Station, you can further extend it to measure various weather related parameters like wind speed, soil temperature, illuminance (lux), rainfall, air quality etc.

Code:

import sys
import RPi.GPIO as GPIO
import os
import Adafruit_DHT
import urllib2
import smbus
import time
from ctypes import c_short

#Register Address
regCall   = 0xAA
regMean   = 0xF4
regMSB    = 0xF6
regLSB    = 0xF7
regPres   = 0x34
regTemp   = 0x2e

DEBUG = 1
sample = 2
deviceAdd =0x77

humi=””
temp=””

#bus = smbus.SMBus(0)  #for Pi1 uses 0
I2cbus = smbus.SMBus(1) # for Pi2 uses 1

DHTpin = 17

key=”30BCDSRQ52AOI3UA”       # Enter your Write API key from ThingSpeak

GPIO.setmode(GPIO.BCM)
# Define GPIO to LCD mapping
LCD_RS = 18
LCD_EN  = 23
LCD_D4 = 24
LCD_D5 = 16
LCD_D6 = 20
LCD_D7 = 21

GPIO.setwarnings(False)
GPIO.setmode(GPIO.BCM)
GPIO.setup(LCD_E, GPIO.OUT)
GPIO.setup(LCD_RS, GPIO.OUT)
GPIO.setup(LCD_D4, GPIO.OUT)
GPIO.setup(LCD_D5, GPIO.OUT)
GPIO.setup(LCD_D6, GPIO.OUT)
GPIO.setup(LCD_D7, GPIO.OUT)

def convert1(data, i):   # signed 16-bit value
return c_short((data[i]<< 8) + data[i + 1]).value

def convert2(data, i):   # unsigned 16-bit value
return (data[i]<< 8) + data[i+1]

def readBmp180(addr=deviceAdd):
value = bus.read_i2c_block_data(addr, regCall, 22)  # Read calibration data

# Convert byte data to word values
AC1 = convert1(value, 0)
AC2 = convert1(value, 2)
AC3 = convert1(value, 4)
AC4 = convert2(value, 6)
AC5 = convert2(value, 8)
AC6 = convert2(value, 10)
B1  = convert1(value, 12)
B2  = convert1(value, 14)
MB  = convert1(value, 16)
MC  = convert1(value, 18)
MD  = convert1(value, 20)

# Read temperature
bus.write_byte_data(addr, regMean, regTemp)
time.sleep(0.005)
(msb, lsb) = bus.read_i2c_block_data(addr, regMSB, 2)
P2 = (msb << 8) + lsb

# Read pressure
bus.write_byte_data(addr, regMean, regPres + (sample << 6))
time.sleep(0.05)
(msb, lsb, xsb) = bus.read_i2c_block_data(addr, regMSB, 3)
P1 = ((msb << 16) + (lsb << 8) + xsb) >> (8 – sample)

# Refine temperature
X1 = ((P2 – AC6) * AC5) >> 15
X2 = (MC << 11) / (X1 + MD) B5 = X1 + X2 temperature = (B5 + 8) >> 4

# Refine pressure
B6  = B5 – 4000
B62 = B6 * B6 >> 12
X1  = (B2 * B62) >> 11
X2  = AC2 * B6 >> 11
X3  = X1 + X2
B3  = (((AC1 * 4 + X3) << sample) + 2) >> 2

X1 = AC3 * B6 >> 13
X2 = (B1 * B62) >> 16
X3 = ((X1 + X2) + 2) >> 2
B4 = (AC4 * (X3 + 32768)) >> 15
B7 = (P1 – B3) * (50000 >> sample)

P = (B7 * 2) / B4

X1 = (P >> 8) * (P >> 8)
X1 = (X1 * 3038) >> 16
X2 = (-7357 * P) >> 16
pressure = P + ((X1 + X2 + 3791) >> 4)

return (str(pressure/100.0))

def readDHT():
humi, temp = Adafruit_DHT.read_retry(Adafruit_DHT.DHT11, DHTpin)
return (str(int(humi)), str(int(temp)))

def lcd_init():
lcdcmd(0x33)
lcdcmd(0x32)
lcdcmd(0x06)
lcdcmd(0x0C)
lcdcmd(0x28)
lcdcmd(0x01)
time.sleep(0.0005)

def lcdcmd(ch):
GPIO.output(RS, 0)
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x10==0x10:
GPIO.output(D4, 1)
if ch&0x20==0x20:
GPIO.output(D5, 1)
if ch&0x40==0x40:
GPIO.output(D6, 1)
if ch&0x80==0x80:
GPIO.output(D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

# Low bits
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x01==0x01:
GPIO.output(LCD_D4, 1)
if ch&0x02==0x02:
GPIO.output(LCD_D5, 1)
if ch&0x04==0x04:
GPIO.output(LCD_D6, 1)
if ch&0x08==0x08:
GPIO.output(LCD_D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

def lcddata(ch):
GPIO.output(RS, 1)
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x10==0x10:
GPIO.output(D4, 1)
if ch&0x20==0x20:
GPIO.output(D5, 1)
if ch&0x40==0x40:
GPIO.output(D6, 1)
if ch&0x80==0x80:
GPIO.output(D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

# Low bits
GPIO.output(D4, 0)
GPIO.output(D5, 0)
GPIO.output(D6, 0)
GPIO.output(D7, 0)
if ch&0x01==0x01:
GPIO.output(LCD_D4, 1)
if ch&0x02==0x02:
GPIO.output(LCD_D5, 1)
if ch&0x04==0x04:
GPIO.output(LCD_D6, 1)
if ch&0x08==0x08:
GPIO.output(LCD_D7, 1)
GPIO.output(EN, 1)
time.sleep(0.0005)
GPIO.output(EN, 0)

def lcdstring(Str):
l=0;
l=len(Str)
for i in range(l):
lcddata(ord(message[i]))

lcd_init()
lcdcmd(0x01)
lcdstring(“Circuit Digest”)
lcdcmd(0xc0)
lcdstring(“Welcomes you”)
time.sleep(3) # 3 second delay

# main() function
def main():

print ‘System Ready…’
URL = ‘https://api.thingspeak.com/update?api_key=%s‘ % key
print “Wait….”
while True:
(humi, temp)= readDHT()
(pressure) =readBmp180()

lcdcmd(0x01)
lcdstring(“Humi#Temp#P(hPa)”)
lcdstring(humi+’%’+”  %sC  %s” %(temp, pressure))
finalURL = URL +”&field1=%s&field2=%s”%(humi, temp)+”&field3=%s” %(pressure)
print finalURL
s=urllib2.urlopen(finalURL);
print  humi+ ” ” + temp + ” ” + pressure
s.close()
time.sleep(10)

if __name__==”__main__”:
main()

Video:

How to build an autonomous, voice-controlled, face-recognizing drone for $200

$
0
0

More adventures in deep learning and cheap hardware.

October 25, 2016

Early aeronautics, 1818.
Early aeronautics, 1818.(source: Library of Congress on Wikimedia Commons).

After building an image-classifying robot, the obvious next step was to make a version that can fly. I decided to construct an autonomous drone that could recognize faces and respond to voice commands.

Choosing a prebuilt drone

One of the hardest parts about hacking drones is getting started. I got my feet wet first by building a drone from parts, but like pretty much all of my DIY projects, building from scratch ended up costing me way more than buying a prebuilt version—and frankly, my homemade drone never flew quite right. It’s definitely much easier and cheaper to buy than to build.

Most of the drone manufacturers claim to offer APIs, but there’s not an obvious winner in terms of a hobbyist ecosystem. Most of the drones with usable-looking APIs cost more than $1,000—a huge barrier to entry.

But after some research, I found the Parrot AR Drone 2.0 (see Figure 1), which I think is a clear choice for a fun, low-end, hackable drone. You can buy one for $200 new, but so many people buy drones and never end up using them that a secondhand drone is a good option and available widely on eBay for $130 or less.

drone collection
Figure 1. The drone collection in my garage. The Parrot AR drone I used is hanging on the far left. Source: Lukas Biewald.

The Parrot AR drone doesn’t fly quite as stably as the much more expensive (about $550) new Parrot Bebop 2 drone, but the Parrot AR comes with an excellent node.js client library called node-ar-drone that is perfect for building onto.

Another advantage: the Parrot AR drone is very hard to break. While testing the autonomous code, I crashed it repeatedly into walls, furniture, house plants, and guests, and it still flies great.

The worst thing about hacking on drones compared to hacking on terrestrial robots is the short battery life. The batteries take hours to charge and then last for about 10 minutes of flying. I recommend buying two additional batteries and cycling through them while testing.

Programming my drone

Javascript turns out to be a great language for controlling drones because it is so inherently event driven. And trust me, while flying a drone, there will be a lot of asynchronous events. Node isn’t a language I’ve spent a lot of time with, but I walked away from this project super impressed with it. The last time I seriously programmed robots, I used C, where the threading and exception handling is painful enough that there is a tendency to avoid it. I hope someone builds Javascript wrappers for other drone platforms because the language makes it easy and fun to deal with our indeterministic world.

Architecture

I decided to run the logic on my laptop and do the machine learning in the cloud. This setup led to lower latency than running a neural network directly on Raspberry PI hardware, and I think this architecture makes sense for hobby drone projects at the moment.

Microsoft, Google, IBM, and Amazon all have fast, inexpensive cloud machine learning APIs. In the end, I used Microsoft’s Cognitive Service APIsfor this project because it’s the only API that offers custom facial recognition.

See Figure 2 for a diagram illustrating the architecture of the drone:

Smart Drone Architecture
Figure 2. The Smart Drone Architecture. Source: Lukas Biewald.

Getting started

By default, the Parrot AR Drone 2.0 serves a wireless network that clients connect to. This is incredibly annoying for hacking. Every time you want to try something, you need to disconnect from your network and get on the drone’s network. Luckily, there is a super useful project called ardrone-wpa2 that has a script to hack your drone to join your own WiFi network.

It’s fun to Telnet into your drone and poke around. The Parrot runs a stripped down version of Linux. When was the last time you connected to something with Telnet? Here’s an example of how you would open a terminal and log into the drone’s computer directly.

% script/connect "The Optics Lab" -p "particleorwave" -a 192.168.0.1 -d 192.168.7.43
% telnet 192.168.7.43

Flying from the command line

After installing the node library, it’s fun to make a node.js REPL (Read-Evaluate-Print-Loop) and steer your drone:

var arDrone = require('ar-drone');
var client = arDrone.createClient({ip: '192.168.7.43'});
client.createRepl();

drone> takeoff()
true

drone> client.animate(‘yawDance, 1.0)

If you are actually following along, by now you’ve definitely crashed your drone—at least a few times. I super-glued the safety hull back together about a thousand times before it disintegrated and I had to buy a new one. I hesitate to mention this, but the Parrot AR actually flies a lot better without the safety hull. This configuration makes the drone much more dangerous without the hull because when the drone bumps into something the propellers can snap, and it will leave marks in furniture.

Flying from a webpage

It’s satisfying and easy to build a web-based interface to the drone (see Figure 3). The express.js framework makes it simple to build a nice little web server:

var express = require('express');

app.get('/', function (req, res) {
 res.sendFile(path.join(__dirname + '/index.html'));
});

app.get('/land', function(req, res) {
 client.land();
});

app.get('/takeoff', function(req, res) {
 client.takeoff();
});

app.listen(3000, function () {
});

I set up a function to make AJAX requests using buttons:


Takeoff
Land


Streaming video from the drone

I found the best way to send a feed from the drone’s camera was to open up a connection and send a continuous stream of PNGs for my webserver to my website.  My webserver continuously pulls PNGs from the drone’s camera using the AR drone library.

var pngStream = client.getPngStream();

pngStream
 .on('error', console.log)
 .on('data', function(pngBuffer) {
       sendPng(pngBuffer);
 }

function sendPng(buffer) {
 res.write('--daboundary\nContent-Type: image/png\nContent-length: ' + buff
er.length + '\n\n');
 res.write(buffer);
});

Running face recognition on the drone images

The Azure Face API is powerful and simple to use. You can upload pictures of your friends and it will identify them. It will also guess age and gender, both functions of which I found to be surprisingly accurate. The latency is around 200 milliseconds, and it costs $1.50 per 1,000 predictions, which feels completely reasonable for this application. See below for my code that sends an image and does face recognition.

var oxford = require('project-oxford'),
oxc = new oxford.Client(CLIENT_KEY);

loadFaces = function() {
 chris_url = "https://media.licdn.com/mpr/mpr/shrinknp_400_400/AAEAAQAAAAAAAALyAAAAJGMyNmIzNWM0LTA5MTYtNDU4Mi05YjExLTgyMzVlMTZjYjEwYw.jpg";
 lukas_url = "https://media.licdn.com/mpr/mpr/shrinknp_400_400/p/3/000/058/147/34969d0.jpg";
 oxc.face.faceList.create('myFaces');
 oxc.face.faceList.addFace('myFaces', {url => chris_url, name=> 'Chris'});
 oxc.face.faceList.addFace('myFaces', {url => lukas_url, name=> 'Lukas'});
}

oxc.face.detect({
 path: 'camera.png',
 analyzesAge: true,
 analyzesGender: true
}).then(function (response) {
 if (response.length > 0) {
  drawFaces(response, filename)
 }
});

I used the excellent ImageMagick library to annotate the faces in my PNGs. There are a lot of possible extensions at this point—for example, there is anemotion API that can determine the emotion of faces.

Running speech recognition to drive the drone

The trickiest part about doing speech recognition was not the speech recognition itself, but streaming audio from a webpage to my local server in the format Microsoft’s Speech API wants, so that ends up being the bulk of the code. Once you’ve got the audio saved with one channel and the right sample frequency, the API works great and is extremely easy to use. It costs $4 per 1,000 requests, so for hobby applications, it’s basically free.

RecordRTC has a great library, and it’s a good starting point for doing client-side web audio recording. On the client side, we can add code to save the audio file:

app.post('/audio', function(req, res) {
 var form = new formidable.IncomingForm();
 // specify that we want to allow the user to upload multiple files in a single request
 form.multiples = true;
 form.uploadDir = path.join(__dirname, '/uploads');

 form.on('file', function(field, file) {
       filename = "audio.wav"
       fs.rename(file.path, path.join(form.uploadDir, filename));
 });

 // log any errors that occur
 form.on('error', function(err) {
       console.log('An error has occured: \n' + err);
 });

 // once all the files have been uploaded, send a response to the client
 form.on('end', function() {
       res.end('success');
 });

 // parse the incoming request containing the form data
 form.parse(req)

 speech.parseWav('uploads/audio.wav', function(text) {
       console.log(text);
       controlDrone(text);
 });
});

I used the FFmpeg utility to downsample the audio and combine it into one channel for uploading to Microsoft:

exports.parseWav = function(wavPath, callback) {
 var cmd = 'ffmpeg -i ' + wavPath + ' -ar 8000 -ac 1 -y tmp.wav';

 exec(cmd, function(error, stdout, stderr) {
       console.log(stderr); // command output is in stdout
 });

 postToOxford(callback);
});

While we’re at it, we might as well use Microsoft’s text-to-speech API so the drone can talk back to us!

Autonomous search paths

I used the ardrone-autonomy library to map out autonomous search paths for my drone. After crashing my drone into the furniture and houseplants one too many times in my livingroom, my wife nicely suggested I move my project to my garage, where there is less to break—but there isn’t much room to maneuver (see Figure 3).

Flying the drone
Figure 3. Flying the drone in my “lab.” Source: Lukas Biewald.

When I get a bigger lab space, I’ll work more on smart searching algorithms, but for now I’ll just have my drone take off and rotate, looking for my friends and enemies:

var autonomy = require('ardrone-autonomy');
var mission = autonomy.createMission({ip: '10.0.1.3', frameRate: 1, imageSize: '640:320'});

console.log("Here we go!")

mission.takeoff()
         .zero()         // Sets the current state as the reference
         .altitude(1)
         .taskSync(console.log("Checkpoint 1"))
         .go({x: 0, y: 0, z: 1, yaw: 90})
         .taskSync(console.log("Checkpoint 2"))
         .hover(1000)
         .go({x: 0, y: 0, z: 1, yaw: 180})
         .taskSync(console.log("Checkpoint 3"))
         .hover(1000)
         .go({x: 0, y: 0, z: 1, yaw: 270})
         .taskSync(console.log("Checkpoint 4"));
         .hover(1000)
         .go({x: 0, y: 0, z: 1, yaw: 0
         .land()

Putting it all together

Check out this video I took of my drone taking off and searching for my friend Chris:

Conclusion

Once everything is set up and you are controlling the drone through an API and getting the video feed, hacking on drones becomes incredibly fun. With all of the newly available image recognition technology, there are all kinds of possible uses, from surveying floorplans to painting the walls. The Parrot drone wasn’t really designed to fly safely inside a small house like mine, but a more expensive drone might make this a totally realistic application. In the end, drones will become more stable, the price will come down, and the real-world applications will explode.

Microsoft’s Cognitive Service Cloud APIs are easy to use and amazingly cheap. At first, I was worried that the drone’s unusually wide-angle camera might affect the face recognition and that the loud drone propeller might interfere with the speech recognition, but overall the performance was much better than I expected. The latency is less of an issue than I was expecting. Doing the computation in the Cloud on a live image feed seems like a strange architecture at first, but it will probably be the way of the future for a lot of applications.

Article image: Early aeronautics, 1818. (source: Library of Congress on Wikimedia Commons).


Viewing all 764 articles
Browse latest View live