
本文介绍了包括 Python、Java、Haskell等在内的一系列编程语言的深度学习库。
毕设大概是大学四年里最坑爹之一的事情了,毕竟一旦选题不好,就很容易浪费一年的时间做一个并没有什么卵用,又不能学到什么东西的鸡肋项目。所幸,鄙人所在的硬件专业,指导老师并不懂软件,他只是想要一个农业物联网的监测系统,能提供给我的就是一个Oracle 11d数据库,带着一个物联网系统运行一年所保存的传感器数据…That’s all。然后,因为他不懂软件,所以他显然以结果为导向,只要我交出一个移动客户端和一个服务端,并不会关心我在其中用了多少坑爹的新技术。
那还说什么?上!我以强烈的恶搞精神,决定采用业界最新最坑爹最有可能烂尾的技术,组成一个 Geek 大杂烩,幻想未来那个接手我工作的师兄的一脸懵逼,我露出了邪恶的笑容,一切只为了满足自己的上新欲。
全部代码在 GPL 许可证下开源:

数据库层除了原有的Oracle 11d数据库以外,还额外增加了一个Redis数据库。之所以增加第二个数据库,原因为:
服务器层,采用 Node.js 的 Express 框架作为客户端的 API 后台。因为 Node.js 的单线程异步并发结构使之可以轻松实现较高的 QPS,所以非常适合 API 后端这一特点。其框架设计和主要功能如下图所示:
像网关层:鉴权模块这么装逼的说法,本质也就是app.use(jwt({secret: config.jwt_secret}).unless({path: ['/signin']}));
客户端层绝大部分是 React Native 代码,但是监控数据的图表生成这一块功能(如下图),由于 React Native 目前没有开源的成熟实现;试图通过 Native 代码来画图表,需要实现一个 Native 和 React Native 互相嵌套的架构,又面临一些可能的困难;故而最终选择了内嵌一个 html 页面,前端代码采用百度的 Echarts 框架来绘制图表。最终的结构就是大部分 React Native + 少部分 Html5 的客户端结构。
另外就是采用了 Redux 来统一应用的事件分发和 UI 数据管理了。可以说,React Native 若能留名青史,Redux 必定是不可或缺的一大原因。这一点我们后文再述。

服务端程序的编写过程中,往往涉及到了大量的异步操作,如数据库读取,网络请求,JSON解析等等。而这些异步操作,又往往会因为具体的业务场景的要求,而需要保持一定的执行顺序。此外,还需要保证代码的可读性,显然此时一味嵌套回调函数,只会使我们陷入代码几乎不可读的回调地狱(Callback Hell)中。最后,由于JavaScript单线程的执行环境的特性,我们还需要避免指定不必要的执行顺序,以免降低了程序的运行性能。因此,我在项目中使用Promise模式来处理多异步的逻辑过程。如下代码所示:
function renderGraph(req, res, filtereds) {
var x = [];
var ys = [];
var titles = [];
filtereds[0].forEach(function(row) {
if (filtered[0] == undefined)
// even if at least one of multi query was succeed
// fast-fail is essential for secure
throw new Error('数据库返回结果为空');
var y = [];
filtered.forEach(function(row) {
titles.push(filtered[0].DEVICENAME + ': ' + filtered[0].DEVICECODE);
res.render('graph', {
titles: titles,
dataX: x,
dataY: ys,
height: req.query.height == undefined ? 200 : req.query.height,
width: req.query.width == undefined ? 300 : req.query.width,
function resFilter(resolve, reject, connection, resultSet, numRows, filtered) {
function (err, rows)
if (err) {
} else if (rows.length == 0) {
process.nextTick(function() {
} else if (rows.length > 0) {
resFilter(resolve, reject, connection, resultSet, numRows, filtered);
function createQuerySingleDeviceDataPromise(req, res, device_id, start_time, end_time) {
return oracle.getConnection()
.then(function(connection) {
return oracle.execute(
DEVICE.DEVICEID = :device_id\
BETWEEN :start_time AND :end_time\
outFormat: oracle.OBJECT,
resultSet: true
.then(function(results) {
var filtered = [];
var filterGap = Math.floor(
(end_time - start_time) / (120 * 100)
return new Promise(function(resolve, reject) {
resFilter(resolve, reject,
connection, results.resultSet, filterGap, filtered);
.catch(function(err) {
status: 'error',
message: err.message
process.nextTick(function() {
function secureCheck(req, res) {
let qry = req.query;
if (
qry.device_ids == undefined
|| qry.start_time == undefined
|| qry.end_time == undefined
) {
throw new Error('device_ids或start_time或end_time参数为undefined');
if (req.query.end_time < req.query.start_time) {
throw new Error('终止时间小于起始时间');
router.get('/', function(req, res, next) {
try {
var device_ids;
var queryPromises = [];
secureCheck(req, res);
device_ids = req.query.device_ids.toString().split(';');
for(let i=0; i<device_ids.length; i++) {
req, res, device_ids[i], req.query.start_time, req.query.end_time));
.then(function(filtereds) {
renderGraph(req, res, filtereds);
}).catch(function(err) {
status: 'error',
message: err.message
} catch(err) {
status: 'error',
message: err.message
这是生成指定N个传感器在一段时间内的折线图的逻辑。显然,剖析业务可知,我们需要在数据库中查询N次传感器,获得N个值对象数组,然后才能去用N组数据渲染出图表的HTML页面。 可以看到,外部核心的Promise控制的流程只集中于下面的几行之中:Promise.all(queryPromises()).then(renderGraph()).catch()
function() {
return new Promise().then().catch();
由此我们可以看到,没有无缘无故的高性能。Node.js 的高并发的优良表现,是用异步编程的高复杂度换来的。当然,你也可以选择不要编程复杂度,即不采用 Promise,Asnyc 等等异步编程模式,任由代码沦入回调地狱之中,那么这时候的代价就是维护复杂度了。其中取舍,见仁见智。

接下来简单介绍下几个主要页面。可以发现 iOS 明显比 Android 要来的漂亮,因为只对 iOS 做了视觉上的细调,直接迁移到 Android 上,就会由于屏幕显示的色差问题,显得非常粗糙。所以,对于跨平台的 React Native App 来说,做两套色值配置文件,以供两个平台使用,还是很有必要的。

上图即是土壤墒情底栏的当前数据页面,分别在Android和iOS上的显示效果,默认展示所有当前的传感器的数值,可以通过选择传感器种类或监测站编号进行筛选,两个条件可以分别设置,选定后再点击查找,即向服务器发起请求,得到数据后刷新页面。由于React Native 的组件化设计,刷新将只刷新下侧的DashBoard部分,且,若有上次已经渲染过的MonitorView,则会复用他们,不再重复渲染,从而实现了降低CPU占用的性能优化。MonitorView,即每一个传感器的展示小方块,自上至下依次展示了传感器种类,传感器编号,当前的传感器数值以及该传感器显示数值的单位。MonitorView和Dashboard均被抽象为一个一般化,可复用的组件,使之能够被利用在气象信息、病虫害监测之中,提升了开发效率,降低了代码的重复率。

上图是土壤墒情界面的历史数据界面,分别在Android和iOS上的展示效果,默认不会显示数据,直到输入了传感器种类和监测站编号,选择了年月日时间后,再点击查找,才会得到结果并显示出来。该界面并非如同当前数据界面一样,Android和iOS代码完全共用。原因在于选择月日和选择时间的控件,Android和iOS系统有各自的控件,它们也被封装为React Native中不同的控件,因此,两条绿色的选择时间的按钮,被封装为HistoricalDateSelectPad,分别放在componentIOS和componentAndroid文件夹中。界面下侧的数据监测板,即代码中的Dashboard,是复用当前数据中的Dashboard。

上图是土壤墒情界面的图表生成界面,分别在Android和iOS上的展示效果。时间选择界面,查找按钮,选择框,均可复用前两个界面的代码,因此无需多提。值得说的是,生成的折线图,事实上是通过内嵌的WebView来显示一个网页的。图表网页的生成,则依靠的百度Echarts 第三方库,然后服务端提供了一个预先写好的前端模板,Express框架填入需要的数据,最后下发到移动客户端上,渲染生成图表。图表支持了多曲线的删减,手指选取查看具体数据点,放大缩小等功能。

actions中则存放根据模块切割开的各类action生成函数集合。在 Redux 中,改变 State 只能通过 action。并且,每一个 action 都必须是 Javascript Plain Object。事实上,创建 action 对象很少用这种每次直接声明对象的方式,更多地是通过一个创建函数。这个函数被称为Action Creator。
reducers中存放许多reducer的实现,其中RootReducer是根文件,它负责把其他reducer拼接为一整个reducer,而reducer就是根据 action 的语义来完成 State 变更的函数。Reducer 的执行是同步的。在给定 initState 以及一系列的 actions,无论在什么时间,重复执行多少次 Reducer,都应该得到相同的 newState。
测试工具:OS X Activity Monitor(http_load)

测试工具:Xcode 7.3

测试工具:Android Studio 1.2.0

React Native 尽管在开发上具有这样那样的坑,但是因其天生的跨平台,和优于 Html5的移动性能表现,使得他在写一些不太复杂的 App 的时候,开发速度非常快,自带两倍 buff。
At Meguro.es #4 on June 21th, 2016, I talked about drawing animated chart on React Native. The talk was about the things I learned through developing an tiny app, Compare. It’s a super simple app to compare temperatures.
Before creating it, I had no idea about what temperatures on weather forecast, like 15 degrees Celsius, were actually like. I remember what yesterday was like, but not the numbers. Typical weather forecast apps shows only future temperatures without past records. Thanks to The Dark Sky Forecast API, the app fetches both of past records and future forecasts, and show them together.
The app's source code is on GitHub:
There might have been some charting libraries to draw similar charts, but I like to write things from scratch. I like to reinvent the wheel especially when it’s a side project. Thanks to that, I found a way to animate smooth paths with the Animated
If I have to add something to the slides:
Two years ago, I started playing around with cheap 433MHz plugs that can be found almost everywhere. At that time, I got several from different brands, from the well known Chacon Di-O plugs, to the most obscure chinese/no-name ones, and my goal was to reverse engineer as much protocols as possible. I compiled the result into a little tool I called rf-ctrl (now available on my GitHub), and forgot about it. However, this summer, I needed to find a solution to remotely control my electric heaters (not because I was cold obviously, but because I had the time to do it), and thought it was time to dig up rf-ctrl with a bit of polishing (a Web UI called Home-RF).
Let’s first talk about OOK a little bit. Most of the cheap 433MHz plugs (but also chimes, rolling shutter controllers, thermometers …) use the ON/OFF Keying or OOK modulation. The idea is that data are sent in binary form, by alternatively enabling and disabling the transmitter, thus the carrier frequency. I found mainly two ways of doing so:
Most of the plugs I found use this scheme, and this is the kind of modulation that rf-ctrl implements. This technique, which could be seen as some kind of Manchester code, allows the receiver to easily recover the clock and sync, since the carrier frequency cannot be enabled or disabled longer than a particular amount of time. The timings for a 0 are often inverted compared to those of a 1, for instance, 160µs-ON/420µs-OFF represents a 0 with the OTAX protocol, while 420µs-ON/160µs-OFF represents a 1. However, this is not systematic, and some protocols use totally different timings, for instance 260µs-ON/260µs-OFF for a 0 and 260µs-ON/1300µs-OFF for a 1 with the Di-O protocol. The data part of the frame is sometimes encapsulated between a starting marker and an ending marker. These markers are also represented with an ON/OFF transition, but with different timings. The whole frame is then repeated a specific number of times, with a delay between the frames that can also be assimilated to either a starting marker without ON state, or an ending one with a long OFF state. Last thing to note is the transition order which is often ON/OFF, but can be OFF/ON as well.
This is actually the “real” low-level way of doing OOK things. You can even describe the previous one that way by choosing a bit-rate (1/Tb) high enough to represent the previous ON/OFF transitions by a succession of ones and zeros that will match the timings. This kind of coding is rather found in high-end devices, like old car keys and more secure plugs/rolling shutters. It was not compatible with the HE853 dongle I had at that time, and thus is not supported by rf-ctrl. However I played with it at a point in order to control the rolling shutters and plugs from the Somfy brand, and to test TI’s CC110x transceiver, but that’s not the purpose of this post.
To replicate a protocol, one must understand two things. The OOK timings (physical characteristics) is the first one and the easiest, while the actual data format of the frame will be the second one.
The easiest way to capture a frame is to use a 1$ 433MHz (433,92Mhz actually) receiver connected to either an oscilloscope or a digital analyzer. You will get something like this (Sumtech protocol):
But if you do not have this kind of receiver but have an oscilloscope laying around, you can also use a simple wire of around 17 cm (= 3×10^8/(4x433x10^6) = lambda/4) connected to one of the inputs ! You will get something like this, which is enough to understand the underlaying timings (Idk protocol this time):
Thanks to this, you can measure the expected timings and the number of times the frame needs to be repeated. It’s time to start writing down zeros and ones on a sheet of paper.
Now, what remains is the actual data to send. Most of the time, a frame consists of a remote ID which is the ID of the remote that sends the frame, a device ID which is just the number of the button pressed on the remote, an action, like ON or OFF, which is most likely 1 or 0, some kind of checksum, and some fixed values. In some cases there are additional values that change every time a button is pressed. They are called rolling codes, and are found in brands like Somfy. This kind of codes are often harder to reverse, but the cheap plugs do not use that. Finally, some protocols add a simple obfuscation layer on top of the frame, like a XOR for instance.
To understand a protocol, the best method remains to gather as much frames as possible, while writing down what generated them. The first step is to determine if two frames generated by pushing the same button are indeed the same. It will most likely be the case, but if not, you need to find out which part of the frame changes. It can be a simple counter, or something more clever. Remember that if there is some kind of encryption/obfuscation, the whole frame can change because of a simple counter. Anyway, you need to scratch your head and find the solution by comparing as much frames as possible.
Assuming all frames generated by one button are the same, the next thing to do is to change one parameter at a time, and look at the result to identify the different fields. For instance, press the ON and OFF button of the same plug number, on the same remote, and compare the resulting frames. Only a small part of it should change, part that you can now identify as the action field.
Then press the ON button for another plug, and compare to the ON button for the first plug. Check that 1) the action field remains the same, 2) something else changed. This something else is probably the device ID. You can then try to open the remote, and look for some kind of multi-switch or jumpers. You will not necessarily find something in all remotes since some will have their ID stored in an Eeprom or something like that, but if you do find something, try to change it and check the generated frames. This will most likely help to find the remote ID.
If you see a part of the frame that seems to change only when something else changes, then you might just have identified a checksum. Try to find how these bits can be computed from the other ones. It can be a for instance a simple sum, or a XOR. Repeat the procedure until you are convinced that all those fields behave as assumed.
Now, keep in mind this is just a generic description of a 433MHz device. Some will not fit the mold and might have, for instance, more or less fields. The frame format can even be completely different.
Once the frame format understood, it’s time to test ! For this you will need a 433MHz transmitter. I first used this HE853 USB dongle, which works fine with a regular PC, but I found out it was easier to just use this 1$ transmitter connected to a Raspberry PI, a TP-Link TL-WR703N router, or any device that offers GPIOs. And this is where rf-ctrl comes in handy. It uses a back-end/front-end (transmitter driver/protocol driver) logic allowing to implement new protocols easily. Here is how to do so:
structure with the values you measured (values are expected in µs)int (*format_cmd)(uint8_t *data, size_t data_len, uint32_t remote_code, uint32_t device_code, rf_command_t command);
function which is supposed to generate the final frame in the pre-allocated *data
array of data_len
bytes from the remote ID, the device ID and the command.rf_protocol_driver
structure with a short and long name for the protocol, the pointer to the format_cmd()
function and to the timing_config
structure, the max allowed remote and device IDs, and the actual parameters this protocol needs (most likely PARAM_REMOTE_ID | PARAM_DEVICE_ID | PARAM_COMMAND
)That’s all ! You should be able to build rf-ctrl and control your plug with it. If it does not work, do not hesitate to check the generated signal with your oscilloscope or digital analyzer.
Let’s get back to the main topic. To control my heaters, I thought I would buy plugs from one of the brands I already reversed, and went to buy the “auchan” ones. Unfortunately, they were still selling 433MHz plugs under the same name, but the underlaying supplier had clearly changed. I decided to buy three of them anyway, but knew I would have to reverse yet another protocol, with the risk it might have used some kind of rolling codes… Hopefully, it did not, and was pretty straightforward to understand. For your information it’s the protocol I called “auchan2”.
Now regarding the actual setup, I used the well known TP-Link TL-WR703N router running OpenWrt and a 1$ 433MHz transmitter (again, like this one) connected, through a 2V -> 5V level shifter, to the GPIO 7 of the router. I wrote the needed Makefile to build rf-ctrl as an OpenWrt package, and also created a kernel driver that generates the proper OOK signal on GPIO 7 once fed with the correct timings and data. This driver, called ook-gpio, is directly provided as an OpenWrt package on my GitHub. Since the WR703N does not have much free space, I chose to build a special firmware for it with everything in it, removing what was useless. Once the firmware flashed, I verified that I was indeed able to control my heaters. But to do that remotely, I had to connect trough SSH and use my command line tool, which looked like something that could have been improved. So I made a little Web UI called Home-RF, which is a little shell script that allows to control rf-ctrl by generating a web page with configurable presets. It looks like this:
The idea is that you can add presets for devices like plugs, rolling shutters or chimes, and they will be displayed like a remote. As a bonus, It also supports WakeOnLan compatible computers (usingetherwake). There is a simple preset editor included in the interface, as well as an advanced panel that allows to manually control rf-ctrl or etherwake. Home-RF will be nicely displayed on a PC, as well as on mobile phones. It is available here on my GitHub, and can be built as an OpenWrt package.
At that point, I rebuilt a firmware with Home-RF inside, and flashed it. I’m using a VPN at home, so I do not care about authentication directly in Home-RF. However, if you plan to use it remotely, do not forget to add some kind of access control on top of it (htaccess, SSH, VPN…) !
In order to build your own RF gateway, you will need:
The provided instructions assume you are working on a PC running Linux.
The schematic for the level shifter is the following:
– Solder the MOSFET and the resistor to match the schematic above
– Use one pad of the PCB as Ground, and solder three wires on Output, +5V Transmitter, and GND Transmitter
– Either solder the 3 pins connector to the other end of the wires, or solder the RF transmitter directly (remove the male pins of the transmitter if any)
– Open the WR703N router, and look for the four signals below:
– Solder one end of four wires on these signals, and the other end to the level shifter previously made
– Solder the 17,3 cm long wire to the antenna pad of the transmitter and put Kapton everywhere to prevent any short-circuit (I tried without antenna at first, that’s why it is missing on my picture)
– Put the board back in its casing, use its reset hole to get the antenna out of it (you will have to bend the antenna to do so, so make sure it does not push the reset button), and close it
You should get something like this:
I attached a prebuilt Barrier Breaker (14.07) OpenWrt firmware with all the tools in it, but it is funnier to build it yourself:
– Create your root folder for the build, for instance my-gateway:
$ mkdir my-gateway
– Go to that folder, and checkout a Barrier Breaker OpenWrt tree (I did not try Chaos Calmer, so let me know if it works):
$ cd my-gateway
$ git clone -b barrier_breaker git://github.com/openwrt/openwrt.git
– Checkout rf-ctrl, Home-RF and ook-gpio:
$ git clone https://github.com/jcrona/rf-ctrl.git
$ git clone https://github.com/jcrona/home-rf.git
$ git clone https://github.com/jcrona/ook-gpio.git
– Create the packages folders in OpenWrt:
$ mkdir -p openwrt/package/utils/home-rf/files
$ mkdir -p openwrt/package/utils/rf-ctrl/src
$ mkdir -p openwrt/package/kernel/ook-gpio
– Copy the packages content:
$ cp -a home-rf/www openwrt/package/utils/home-rf/files/
$ cp home-rf/OpenWrt/Makefile openwrt/package/utils/home-rf/
$ cp rf-ctrl/* openwrt/package/utils/rf-ctrl/src/
$ cp rf-ctrl/OpenWrt/Makefile openwrt/package/utils/rf-ctrl/
$ cp -a ook-gpio/* openwrt/package/kernel/ook-gpio/
– Update external feeds in OpenWrt and add etherwake to the build system:
$ cd openwrt
$ ./scripts/feeds update -a
$ ./scripts/feeds install etherwake
– Download the attached home-rf_openwrt.config into the my-gateway folder, and use it:
$ cp ../home-rf_openwrt.config .config
$ make oldconfig
– Build the OpenWrt firmware
$ make
– You should have your firmware ready in my-gateway/openwrt/bin/ar71xx/.
If you have any issue buidling the mac80211 package, it might be because the build system failed to clone the linux-firmware Git. In that case, download the linux-firmware-2014-06-04-7f388b4885cf64d6b7833612052d20d4197af96f.tar.bz2 archive from here, copy it into the my-gateway/openwrt/dl/ folder, and restart the build.
Now, you need to flash your WR703N router. If you never flashed OpenWrt before on your router, use openwrt-ar71xx-generic-tl-wr703n-v1-squashfs-factory.bin as explained here. Otherwise, use openwrt-ar71xx-generic-tl-wr703n-v1-squashfs-sysupgrade.bin with the sysupgrade tool.
At that point, you should have your router up and running. You still need to configure it like a regular OpenWrt router, as explained here. You can, for instance, configure it in WiFi station mode, so that you can find the best place to reach all your 433MHz devices.
Once properly configured, open a browser and go to http://<your_router_ip>/home-rf. If everything went well, you will get the Home-RF interface ready to be configured !
So now I’m able to control my electric heaters from my phone for around 20$, and I hope you will be able to do the same with your own 433MHz devices. All the discussed tools are available on myGitHub. I will be happy to extend the list of supported protocols in rf-ctrl, so feel free to add more.
If you want to play around, try the “scan” mode of rf-ctrl ! It allows to send all possible frames within a range of given remote IDs, device IDs, and protocols.
That’s all for now !
I recently bought a DVB-T dongle containing the Realtek RTL2832U and Raphael Micro R820T chips with the intent to use it as a Software-Defined Radio (SDR) receiver. These dongles are incredible because for about $10, you can tune in to frequencies between 24 and 1766MHz and listen to a wide range of devices and signals, provided you have a proper antenna (and a down-/up- converter if you want to listen outside of this range). The device, pictured below, is truly very simple: the back consists solely of a couple lines that could probably not be routed on the top layer of the PCB.
As a first project, I decided to look into the 433MHz frequency, as others have also successfully done (see here, here, and here for instance), but decided to focus on the methodology and the tools available, rather than recovering a specific device’s key, since I didn’t have one lying around. This post describes the manual process I followed with existing tools, as well as a basic MATLAB script that I wrote interfacing with the RTL device which automates the binary signal recovery process.
UPDATE: There is some good discussion of this post going on at Hackaday, RTL-SDR, and Reddit, which also contain a few more pointers for this kind of thing. My response to some of the points raised can be found here. A good alternative to MATLAB which I had not considered is Octave, which apparently interfaces well with GNU Radio.
As mentioned above, I did not have a device transmitting at 433MHz, so instead I used a typical cheap MX-FS-03V RF transmitter (pictured below) bought off of EBay, connected to an Arduino Uno. I used the rc-switch library, which appears to be pretty popular, with a lot of forks on GitHub. My code‘s loop simply calls mySwitch.send("010010100101")
followed by a delay of 1 second and makes no other calls to the library besides enabling transmission on the appropriate Arduino pin.
The goal of the project was to uncover the details of the protocol (and the value transmitted) before looking at the library code to verify it. To this end, I installed SDR# to visualize and record the signal, as well as Audacity to inspect the produced WAV file. I additionally installed the rtl-sdr and rtl_433 libraries which contain command-line utilities for automation (Windows binaries can be found here and here).
Having programmed the Arduino and left it to constantly transmit, my first step was to fire up SDR# to visually inspect the signal. The figures below show SDR#’s spectrum analyzer and waterfall graphs centered at 433MHz. The spectrum analyzer shows a consistent noise level across frequencies when the transmitter is silent, and also indicates a few DC bias spikes. Moreover, the waterfall illustrates that the transmitter output is not filtered and produces noise/energy across many unwanted frequencies. [UPDATE: Per a suggestion here, reducing the gain helps remove the aliases, but does not entirely eliminate them.]
This can be seen even more clearly below, when a transmission is occurring, where we can also identify that the strongest signal is actually at 434MHz.
After selecting the frequency, I recorded 10 seconds of the signal which came out as an astonishingly large 110MB WAV file! Opening up the recording on Audacity, as shown below, we can identify 10 seemingly identical, equally spaced transmissions 1 second apart, with the exception of the 8th one.
We ignore the anomaly for now (as a closer inspection indicates it is simply truncated, but otherwise the same as other transmissions), and focus on an individual section:
Once more we find 10 identical transmissions within each section, so zooming further we can clearly identify the modulation as a type of on-off keying (OOK) where 0s are short HIGH bursts followed by long periods of silence, and 1s are long HIGH bursts followed by small periods of silence.
Note of course that the encoding could be reversed, but it is reasonable to assume that it is not (and our knowledge of what is being transmitted tells us we are right!): the signal appears to be 0100101001010
. This is indeed what we transmitted, but there is a spurious 0 at the end. Though this could be a checksum, flipping the last bit or removing it does not alter the value, hence we can assume it is simply an End-of-Message (EOM) value. Looking at the individual signals for 0 and 1, we see that the pulse length for a 0 is 350μs long, and it is 3 times as long for a 1.
Looking at the setup code, we see that the pulse length is indeed 350μs long, and each message is repeated 10 times, each of which is followed by a sync message. Moreover, for the default protocol, a 0 is represented as 1 HIGH, 3 LOWs, while a 1 is the reverse. Success!
Even though rtl_433 readily decodes this message for us, when I found out that MATLAB has a package for RTL-SDR (which needs the Communications System Toolbox), I thought I’d try it out. As a first step, I tried the spectrum analyzer example, just to ensure that everything works. 433.989MHz gave the strongest signal, and behaves as expected both during silence and transmission:
The data is output in I/Q format with values between -1 and 1, but I did not want to write a demodulator, so I instead took the real part, corresponding to the in-phase component, which proves to be sufficient for our purposes. [UPDATE: An alternative is taking the modulus of the complex value. This has the added benefit of not needing the Hilbert transform below, asthis comment mentions. I can confirm that setting rdata = abs(data);
and binary(smoothed >= high_thres) = 1;
in the code works without further changes.] As can be seen in the figure below and left, the output is very noisy, so I immediately applied a Savitzky-Golay filter, which was chosen to be cubic for data frames of length 41, as in the MATLAB example. As the picture below and to the right shows, the filtering is very effective.
Having reduced the noise, the next step was to calculate the envelope of the signal, which in MATLAB is implemented by taking the modulus of the Hilbert transform, as also explainedhere. The figures below show what that looks like for the overall signal, as well as for a specific transmission of our 10 bits. As can be seen, during the transmission the envelope fluctuates a bit, but is most frequently above 1. When the transmission is not occurring, the value remains below 0.1, but this is not pictured here.
The conversion to a binary signal is straightforward: if the magnitude of the above quantity is above 0.5, the signal is considered to be at a logical HIGH, and if it is below 0.5 it is a logical LOW. Zooming into one of the transmissions shows us that the digital pulse produced is as expected, without noise:
The basic idea to automatically detect whether a signal is a 0 or 1 is simple: count the number of consecutive samples that were HIGH, and if they are close to the transmission pulse length of a 0 or a 1, print that value! There were a few intricacies in debouncing (where the code basically skips over a few LOWs in between HIGHs) and in setting the appropriate thresholds for what counts as “close enough”, but in the end the code was able to accurately recover all transmitted bits. That said, I expect that changes to the parameters will need to be made for other hardware, depending on factors such as the antennas and power of transmission.
RTL-SDR definitely opens up many possibilities. Even though this post was a “toy example”, it has real-world implications as plenty of devices operate freely at 433MHz and other frequencies, as explained in the introduction. Although MATLAB is not always easy to work with, it has tremendous capabilities, and the fact that it interfaces with the dongle is a great feature.
I believe that the RTL-SDR community would greatly benefit from more open-source projects using MATLAB, so I have made my code availabe on GitHub, if you would like to try it out for yourself. As mentioned above, it might need some tweaking based on your hardware, but I hope such changes will be minimal. If you have any comments or improvements, feel free tocontact me!
My initial plan was to use GNU Radio on my new Raspberry Pi 2, but despite its extra processing power, I found that it could not adequately do signal processing, even for FM frequencies, and often underflowed. If you are interested in going down that route, you might want to look at this post containing installation instructions, and gqrx as a *nix alternative to SDR# (it’sgqrx-sdr
under the repositories). Also take a look at this forum discussion if you get a BadMatch
error, and at this post detailing how to approach the analysis using GNU Radio. Finally, if you, like me, don’t have an Ethernet plug available, but have an Android phone that can tether (even if it is using Wi-Fi), connect it to your Pi’s USB, set the connection mode to “Media” and follow the instructions here!
[导读] “大数据时代”,数据为王!无论是数据挖掘还是目前大热的深度学习领域都离不开“大数据”。大公司们一般会有自己的数据,但对于创业公司或是高校老师、学生来说,“Where can I get large datasets open to the public?”是不得不面对的一个问题。
通过启明星辰ADLab的调查分析,Mirai僵尸网络有两次攻击史,其中一次是针对安全新闻工作者Brian Krebs的网站,攻击流量达到665Gbps。
(2)2016年9月20日,著名的安全新闻工作者Brian Krebs的网站KrebsOnSecurity.com受到大规模的DDoS攻击,其攻击峰值达到665Gbps,Brian Krebs推测此次攻击由Mirai僵尸发动。
在2016年10月初,Imperva Incapsula的研究人员通过调查到的49,657个感染设备源分析发现,其中主要感染设备有CCTV摄像头、DVRs以及路由器。根据这些调查的设备IP地址发现其感染范围跨越了164个国家或地区,其中感染量最多的是越南、巴西、美国、中国大陆和墨西哥。
(3)强制清除其他主流的IOT僵尸程序,干掉竞争对手,独占资源。比如清除QBOT、Zollard、Remaiten Bot、anime Bot以及其他僵尸。
受感染的设备端的 bot程序通过随机策略扫描互联网上的设备,并会将成功猜解的设备用户名、密码、IP地址,端口信息以一定格式上传给sanListen,sanLiten解析这些信息后交由Load模块来处理,Load通过这些信息来登录相关设备对设备实施感染,感染方式有echo方式、wget方式和tftp方式。这三种方式都会向目标设备推送一个具有下载功能的微型模块,这个模块被传给目标设备后,命名为dvrHelper。最后,dvrHelper远程下载bot执行,bot再次实施Telnet扫描并进行密猜解,由此周而复始的在网络中扩散。这种感染方式是极为有效的,Anna-senpai曾经每秒会得到500个成功爆破的结果。
Mirai会通过一种 memory scraping的技术干掉设备中的其他恶意软件,其具体做法是搜索内存中是否存在QBOT特征、UPX特征、Zollard蠕虫特征、Remaiten bot特征来干掉对手,以达到独占资源的目的。
Mirai僵尸中内置有60余个用户名和密码,其中内置的用户名和密码是加密处理过的,加密算法是通过简单的单字节多次异或实现,其密钥为0xDEADBEEF, 解密密钥为0xEFBEADDE。
命令操作类型 | Index | 有效 | 功能描述 |
TABLE_SCAN_CB_DOMAIN | 18 | yes | domain to connect to |
TABLE_SCAN_CB_PORT | 19 | yes | Port to connect to |
TABLE_SCAN_SHELL | 20 | yes | ‘shell’ to enable shell access |
TABLE_SCAN_ENABLE | 21 | yes | ‘enable’ to enable shell access |
TABLE_SCAN_SYSTEM | 22 | yes | ‘system’ to enable shell access |
TABLE_SCAN_SH | 23 | yes | ‘sh’ to enable shell access |
TABLE_SCAN_QUERY | 24 | yes | echo hex string to verify login |
TABLE_SCAN_RESP | 25 | yes | utf8 version of query string |
TABLE_SCAN_NCORRECT | 26 | yes | ‘ncorrect’ to fast-check for invalid password |
TABLE_SCAN_PS | 27 | no | “/bin/busybox ps” |
TABLE_SCAN_KILL_9 | 28 | no | “/bin/busybox kill -9 “ |
zero(1个字节) | IP地址(4bytes) | 端口(2bytes) | 用户名长度(4bytes) | 用户名(muti-bytes) | 密码长度(4bytes) | 密码(muti-bytes) |
Mirai的攻击类型包含UDP攻击、TCP攻击、HTTP攻击以及新型的GRE攻击。其中,GRE攻击就是著名安全新闻工作者Brian Krebs的网站KrebsOnSecurity.com遭受的主力攻击形式,攻击的初始化代码如下:
type Attackstruct {
Targetsmap[uint32]uint8 //Prefix/netmask
Flagsmap[uint8]string // key=value
目标数(4个字节) | IP地址(4个字节) | MASK(一个字节) | IP地址(4个字节) | MASK(一个字节) | IP地址….MASK… |
攻击类型(32位) | 类型值 | 攻击函数 |
ATK_VEC_UDP | 0 | attack_udp_generic |
ATK_VEC_VSE | 1 | attack_udp_vse |
ATK_VEC_DNS | 2 | attack_udp_dns |
ATK_VEC_UDP_PLAIN | 9 | attack_udp_plain |
ATK_VEC_SYN | 3 | attack_tcp_syn |
ATK_VEC_ACK | 4 | attack_tcp_ack |
ATK_VEC_STOMP | 5 | attack_tcp_stomp |
ATK_VEC_GREIP | 6 | attack_gre_ip |
ATK_VEC_GREETH | 7 | attack_gre_eth |
ATK_VEC_PROXY | 8 | attack_app_proxy(已经被取消) |
ATK_VEC_HTTP | 10 | attack_app_http |
这其中的GRE攻击也就是9月20日安全新闻工作者Brian Krebs攻击事件的主力攻击类型。
(2)登陆成功后,尝试运行命令/bin/busybox ps来确认是否可以执行busybox命令。
(3)远程执行/bin/busybox cat /proc/mounts;用于发现可读写的目录。
(5)接下来通过执行命令”/bin/busybox cat /bin/echo\r\n”来获取当前设备架构信息。
(3)通过端口扫描工具探测自己的设备是否开启了SSH (22), Telnet (23)、 HTTP/HTTPS (80/443)服务,如果开启,请通知技术人员禁用SSH和Telnet服务,条件允许的话也可关闭HTTP./HTTPS服务(防止类似攻击利用Web对设备进行感染)。
Networked hard drives are super convenient. You can access files no matter what computer you’re on — and even remotely.
But they’re expensive. Unless you use the Raspberry Pi.
If you happen to have a few of hard drives laying around you can put them to good use with a Raspberry Pi by creating your own, very cheap NAS setup. My current setup is two 4TB hard drives and one 128GB hard drive, connected to my network and accessible from anywhere using the Raspberry Pi.
Here’s how.
For starters, you need an external storage drive, such as an HDD, SSD or a flash drive.
You also need a Raspberry Pi. Models 1 and 2 work just fine for this application but you will get a little better support from the Raspberry Pi 3. With the Pi 3, you’re still limited to USB 2.0 and 100Mbps via Ethernet. However, I was able to power one external HDD with a Pi 3, while the Pi 2 Model B could not supply enough power to the same HDD.
In my Raspberry Pi NAS, I currently have one powered 4TB HDD, one non-powered 4TB HDD and a 128GB flash drive mounted without issue. To use a Pi 1 or 2 with this, you may want to consider using a powered USB hub for your external drives or using a HDD that requires external power.
Additionally, you need a microSD card — 8GB is recommended — and the OpenMediaVault OS image, which you can download here.
To install the operating system, we will use the same method used for installing any OS without NOOBS. In short:
More detailed installation instructions can be found here for both Windows and Mac. Just substitute the Raspbian image with OpenMediaVault.
After the image has been written to the SD card, connect peripherals to the Raspberry Pi. For the first boot, you need a keyboard, monitor and a local network connection via Ethernet. Next, connect power to the Raspberry Pi and let it complete the initial boot process.
Once that is finished, use the default web interface credentials to sign in. (By default, the username isadmin and the password is openmediavault.) This will provide you with the IP address of the Raspberry Pi. After you have that, you will no longer need a keyboard and monitor connected to the Pi.
Connect your storage drives to the Raspberry Pi and open a web browser on a computer on the same network. Enter the IP address into the address bar of the browser and press return. Enter the same login credentials again ( admin for the username and openmediavault for the password) and you will be taken to the web interface for your installation of OpenMediaVault.
The first thing you will want to do to get your NAS online is to mount your external drives. Click File Systems in the navigation menu to the left under Storage.
Locate your storage drives, which will be listed under the Device column as something like /dev/sda1 or/dev/sdc2. Click one drive to select it and click Mount. After a few seconds have passed, click Apply in the upper right corner to confirm the action.
Repeat this step to mount any additional drives.
Next, you will need to create a shared folder so that the drives can be accessed by other devices on the network. To do this:
Finally, to access these folders and drives from an external computer on the network, you need to enable SMB/CFIS.
Click SMB/CFIS under Services in the left navigation pane and click the toggle button beside Enable. Click Save and Apply to confirm the changes.
Next, click on the Shares tab near the top of the window. Click Add, select one of the folders you created in the dropdown menu beside Shared folder and click Save. Repeat this step for shared folders you created.
Now that your NAS is up and running, you need to map those drives from another computer to see them. This process is different for Windows and Mac, but should only take a few seconds.
To access a networked drive on Windows, open File Explorer and click This PC. Select the Computer tab and click Map network drive.
In the dropdown menu beside Drive choose an unused drive letter. In the Folder field, input the path to the network drive. By default, it should look something like \\RASPBERRYPI\[folder name]. (For instance, one of my folders is HDD, so the folder path is \\RASPBERRYPI\HDD). Click Finish and enter the login credentials. By default, the username is pi and the password is raspberry. If you change or forgot the login for the user, you can reset it or create a new user and password in the web interface under User in Access Rights Management.
To open a networked folder in OS X, open Finder and press Command + K. In the window that appears, type smb://raspberrypi or smb://[IP address] and click Connect. In the next window, highlight the volumes you want to mount and click OK.
You should now be able to see and access those drives within Finder or File Explorer and move files on or off the networked drives.
There are tons of settings to tweak inside OpenMediaVault, including the ability to reboot the NAS remotely, setting the date and time, power management, a plugin manager and much, much more. But if all you need is a network storage solution, you’ll never need to dig any deeper.
MVP是最简化可实行产品(Minimum Viable Product)的简称。最简化可实行产品是以尽可能低的成本展现产品的核心概念,用最快、最简的方式建立一个可用的产品原型,用这个原型表达出你产品最终想要的效果,然后通过迭代来完善细节。
1. 提出想法、快速构建
短短 1 天半的时间里,全局搜索产品快速迭代了三次。从只能把汉字作为关键字,到可以直接用拼音进行搜索,再把关键词和模块自动分类 ,提高整个搜索工具的检索速度。
Deep Learning is a sub-field of Machine Learning that has its own peculiar ways of doing things. Here are 10 lessons that we’ve uncovered while building Deep Learning systems. These lessons are a bit general, although they do focus on applying Deep Learning in a area that involves structured and unstructured data.
The one tried and true way to improve accuracy is to have more networks perform the inferencing and combining the results. In fact, techniques like DropOut is a means of creating “Implicit Ensembles” were multiple subsets of superimposed networks cooperate using shared weights.
2. Seek Problems where Labeled Data is Abundant
The current state of Deep Learning is that it works well only in a supervised context. The rule of thumb is around 1,000 samples per rule. So if you are given a problem where you don’t have enough data to train with, try considering an intermediate problem that does have more data and then run a simpler algorithm with the results from the intermediate problem.
3. Search for ways to Synthesize Data
Not all data is nicely curated and labeled for machine learning. Many times you have data that are weakly tagged. If you can join data from disparate sources to achieve a weakly labeled set, then this approach works surprisingly well. The most well known example is Word2Vec where you train for word understanding based on the words that happen to be in proximity with other words.
4. Leverage Pre-trained Networks
One of the spectacular capabilities of Deep Learning networks is that bootstrapping from an existing pre-trained network and using it to train into a new domain works surprisingly well.
5. Don’t forget to Augment Data
Data usually have meaning that a human may be aware of that a machine can likely never discover. One simple example is a time feature. From the perspective of a human the day of the week, whether this is a holiday or not or the time of the day may be important attributes, however a Deep Learning system may never be able to surface that if all its given are seconds since Unix epoch.
6. Explore Different Regularizations
L1 and L2 regularizations are not the only regularizations that are out there. Explore the different kinds and perhaps look at different regularizations per layer.
7. Embrace Randomness
There are multiple techniques to initialize your network prior to training. In fact, you can get very far just training the last layer of a network with the previous layers being mostly random. Consider using this technique to speed up you Hyper-tuning explorations.
8. End-to-End Deep Learning is a Hail Mary Play
A lot of researchers love to explore end-to-end deep learning research. Unfortunately, the most effective use of Deep Learning has been to couple it with out techniques. AlphaGo would not have been successful if Monte Carlo Tree Search was not employed. If you want to make an impact in the Academic community then End-to-end Deep Learning might be your gamble. However in a time constrained industrial environment that demands predictable results, then you best be more pragmatic.
9. Resist the Urge to Distribute
If you can, try to avoid using multiple machines (with the exception of hyper-parameter tuning). Training on a single machine is the most cost effective way to proceed.
10. Convolution Networks work pretty well even beyond Images
Convolution Networks are clearly the most successful kind of network in the Deep Learning space. However, ConvNets are not only for Images, you can use them for other kinds of features (i.e. Voice, time series, text).
That’s all I have for now. There certainly a lot more other lessons. Let me know if you stumble on others.
You can find more details of these individual lessons athttp://www.deeplearningpatterns.com
Originally published at blog.alluviate.com.
怎样抓网页呢?其实就是根据URL来获取它的网页信息,虽然我们在浏览器中看到的是一幅幅优美的画面,但是其实是由浏览器解释才呈现出来的,实质它是一段HTML代码,加 JS、CSS,如果把网页比作一个人,那么HTML便是他的骨架,JS便是他的肌肉,CSS便是它的衣服。所以最重要的部分是存在于HTML中的,下面我们就写个例子来扒一个网页下来。
import urllib2
response = urllib2.urlopen("http://www.baidu.com")
print response.read()
结果就和在Chrome等浏览器中右键查看源码一样的内容,urllib2是python内置库,简化了httplib的用法(urllib2.urlopen相当于Java中的HttpURLConnection)。有2那肯定有urllib啊,urllib2可以接受一个Request类的实例来设置URL请求的headers,但urllib仅可以接受URL。这意味着,你不可以伪装你的User Agent字符串等。urllib2在python3.x中被改为urllib.request。 接下来用urllib2伪装iphone 6浏览,模拟浏览器发送GET请求。
req = request.Request('http://www.douban.com/')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with request.urlopen(req) as f:
print('Status:', f.status, f.reason)
print('Data:', f.read().decode('utf-8'))
<link rel="apple-touch-icon-precomposed" href="https://gss0.bdstatic.com/5bd1bjqh_Q23odCf/static/wiseindex/img/screen_icon.png"/>
<meta name="format-detection" content="telephone=no"/>
from urllib import parse
print('Login to weibo.cn...')
email = input('Email: ')
passwd = input('Password: ')
login_data = parse.urlencode([
('username', email),
('password', passwd),
('entry', 'weibo'),
('client_id', ''),
('savestate', '1'),
('ec', ''),
('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F')
req = request.Request('https://passport.weibo.cn/sso/login')
req.add_header('Origin', 'https://passport.weibo.cn')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F')
with request.urlopen(req, data=login_data.encode('utf-8')) as f:
print('Status:', f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
print('Data:', f.read().decode('utf-8'))
它们的关系:CookieJar–派生->FileCookieJar –派生–>MozillaCookieJar和LWPCookieJar
from urllib import request
from http.cookiejar import CookieJar
cookie_support= request.HTTPCookieProcessor(cookie)#cookie处理器
opener = request.build_opener(cookie_support)
for item in cookie:
结果: >BAIDUID : E4DECD4AF63915B9AFF5AC28951A3DAA:FG=1
H_PS_PSSID : 1437_18241_17944_21079_18559_21454_21406_21377_21191_21321
PSTM : 1477631558
这里使用默认的CookieJar 对象,如果要将Cookie保存起来,可以使用FileCookieJar类和其子类中的save方法,加载就用load方法。
headers=[('User-Agent','Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25'),
def getCookie()
cookie_support= request.HTTPCookieProcessor(cookie)#cookie处理器
opener = request.build_opener(cookie_support)
return cookie
有了cookie就可以爬了,爬的内容怎么处理呢,介绍个SB工具—— BeautifulSoup。
BeautifulSoup翻译叫鸡汤,现在版本是4.5.1,简称BS4,倒过来叫4SB,不过抓数据一点都不SB。提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱,通过解析文档为用户提供需要抓取的数据,因为简单,所以不需要多少代码就可以写出一个完整的应用程序。Beautiful Soup自动将输入文档转换为Unicode编码,输出文档转换为utf-8编码。你不需要考虑编码方式,除非文档没有指定一个编码方式,这时,Beautiful Soup就不能自动识别编码方式了。然后,你仅仅需要说明一下原始编码方式就可以了。 关于BS的介绍和用法官方文档很详细,下面给几个”Web scraping with python”1中的例子看下BS是否好喝,可以和文档对照看。 首先你得安装了BS,然后爬取http://www.pythonscraping.com/pages/page3.html中的图片来小试牛刀。
import re
from urllib import request
from bs4 import BeautifulSoup
for pic in bs.find_all('img',{'src':re.compile(".*\.jpg$")}):
结果: >../img/gifts/logo.jpg
# get the proxy
with open('proxy.txt', 'w') as f:
for page in range(1,101):
if page%50==0:#每50页更新下cookie
url = 'http://www.xicidaili.com/nn/%s' %page
cookie_support= request.HTTPCookieProcessor(cookie)
opener = request.build_opener(cookie_support)
req = request.Request(url,headers=dict(headers))
content = request.urlopen(req)
soup = BeautifulSoup(content,"lxml")
trs = soup.find('table',id="ip_list").findAll('tr')
for tr in trs[1:]:
tds = tr.findAll('td')
ip = tds[1].text.strip()
port = tds[2].text.strip()
protocol = tds[5].text.strip().
f.write('%s://%s:%s\n' % (protocol, ip, port))
结果十五秒爬了1万条数据(与电脑环境有关),说明1页正好100条,而总页数超过1000页,也就是记录数超过10w条,如果固定用同一个cookie肯定不安全(谁会有空翻看1000页数据。。。),因此设置每爬50页更新下cookie。 有了代理地址,不一定能保证有效,可能就被封杀了,因此使用思路是把代理地址存入哈希表,验证无效的删除(看状态码),重新在表中取新的记录。 代理地址使用方式如下:
proxy_handler = request.ProxyHandler({'http': ''}) #http://www.xicidaili.com/nn/2 随便找个
opener = request.build_opener(proxy_handler,cookie_handler ...各种其他handle)
另外推荐个神器,crawlera ,基本满足各种需要。
写爬虫还要考虑其他很多问题,授权验证、连接池、数据处理、js处理等,这里有个经典爬虫框架:Scrapy,目前支持python3,支持分布式, 使用 Twisted来处理网络通讯,架构清晰,并且包含了各种中间件接口,可以灵活的完成各种需求。
从内容上讲,两者具有功能差不多,包括以上3,5,6。不同是Scrapy原生不支持js渲染,需要单独下载scrapy-splash,而PyScrapy内置支持scrapyjs;PySpider内置 pyquery选择器,Scrapy有XPath和CSS选择器,这两个大家可能更熟一点;此外,Scrapy全部命令行操作,Pyscrapy有较好的WebUI;还有,scrapy对千万级URL去重支持很好,采用布隆过滤来做,而Spider用的是数据库来去重?最后,PySpider更加容易调试,scrapy默认的debug模式信息量太大,warn模式信息量太少,由于异步框架出错后是不会停掉其他任务的,也就是出错了还会接着跑。。。从整体上来说,pyspider比scrapy简单,并且pyspider可以在线提供爬虫服务,也就是所说的SaaS,想要做个简单的爬虫推荐使用它,但自定义程度相对scrapy低,社区人数和文档都没有scrapy强,但scrapy要学习的相关知识也较多,故而完成一个爬虫的时间较长。
由于Google cache基于你懂的原因不可用,其余都可以利用,Crawlera的分布式下载,我们可以在下次用一篇专门的文章进行讲解。下面主要从动态随机设置user agent、禁用cookies、设置延迟下载和使用代理IP这几个方式入手。
Scrapy下载器通过中间件控制的,要实现代理IP、user agent切换可以自定义个中间件。 在项目下创建(如何创建项目,使用scrapy start yourProject命令,参考文档)好项目后,在里面找到setting.py文件,先把agents和代理ip放到setting.py中(代理ip较少情况下这样做,较多的话还是放到数据库中去,方便管理),设置中间件名字MyCustomSpiderMiddleware和优先级。
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",
"Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",
"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",
"Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",
"Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070215 K-Ninja/2.1.1",
"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",
"Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko Fedora/ Kazehakase/0.5.6",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",
"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",
{'ip_port': '', 'user_pass': ''},
{'ip_port': '', 'user_pass': ''},
{'ip_port': '', 'user_pass': ''},
{'ip_port': '', 'user_pass': ''},
{'ip_port': '', 'user_pass': ''},
{'ip_port': '', 'user_pass': ''},
# 禁用cookoe (enabled by default)
# 下载中间件
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html
'weiboZ.middlewares.MyCustomDownloaderMiddleware': 543,
import random
import base64
from settings import PROXIES
class RandomUserAgent(object):
"""Randomly rotate user agents based on a list of predefined ones"""
def __init__(self, agents):
self.agents = agents
def from_crawler(cls, crawler):
return cls(crawler.settings.getlist('USER_AGENTS'))
def process_request(self, request, spider):
request.headers.setdefault('User-Agent', random.choice(self.agents))
class ProxyMiddleware(object):
def process_request(self, request, spider):
proxy = random.choice(PROXIES)
if proxy['user_pass'] is not None:
request.meta['proxy'] = "http://%s" % proxy['ip_port']
encoded_user_pass = base64.encodestring(proxy['user_pass'])
request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
request.meta['proxy'] = "http://%s" % proxy['ip_port']
当你准备爬某个网站的时候,首先应该先看下该网站有没有robots.txt。robots.txt是1994年出现的,也称为机器人排除标准(Robots Exclusion Standard),网站管理员不想某些内容被爬到的时候可以再该文件中注明。robots.txt虽然有主流的语法格式,但是各大企业标准不一,没有别人可以阻止你创建自己版本的robots.txt,但这些robots.txt不应该因为不符合主流而不被遵守。一般文件字段包含:User-agent,Allow,Disallow分别代表搜索机器人允许看和不许看的内容。
之前看新闻说今年4月大众点评把百度给告了,请求法院判令两被告停止不正当竞争行为,共同赔偿汉涛公司经济损失9000万元和为制止侵权行为支出的45万余元,并刊登公告、澄清事实消除不良影响。有用百度地图的应该知道这个(最近百度高德开撕,又在黑百度了~~~),定位完毕会显示附近商家和点评信息,来看下大众点评网的robots.txt. 光看
User-agent: *
Disallow: /shop//rank_p
说多了,来看下新浪微博的Robots协议。明确规定了Sitemap: http://weibo.com/sitemap.xml 中列出的内容不允许被百度、360、谷歌、搜狗、微软必应、好搜、神马查看,后面还注明了Disallow: User-agent: * Disallow: /,也就是说前面是单独列出的,理论上这些数据不允许任何机构和个人爬取。这些是啥数据呢,movie和music数据,那你放心好了,微博文本数据可以爬了,但人家也不傻,可以显示的微博信息是有限制的,不可能所有数据库的数据都显示出来。
以上海找房子为例,微博搜索框输入@上海租房 就可以的到如下页面 >http://s.weibo.com/weibo/%2540%25E4%25B8%258A%25E6%25B5%25B7%25E7%25A7%259F%25E6%2588%25BF?topnav=1&wvr=6&b=1
还是不错的,然后看下源码发现并没有html数据,显然是AJAX异步了,Scrapy要爬的话还得安装scrapy-splash改下配置用splash解析js内容,而且要看下一页必须登录状态才可以,那要在header里面添加cookie,可以登录后chrome F12 开发工具查看,但你敢保证拿包含自己的账号的cookie去做爬虫发现了不被封?其实这里可以显示的数据最多1000条,按最新的1000条显示,何必大费周章去搞那么复杂呢,可以用移动版的微博搜下嘛,点击。
用开发者工具看下网络请求数据状况,搜索包含名字‘page’ 请求消息头,可以发现规律:
微博内容id 对应字段放数据库中将有唯一约束,防止重复微博。选择mblogid作为唯一id,而千万不是itemid,经测试发现itemid只代表当天微博的槽位,比如限制浏览10条数据,就有1~10个槽位,而itemid就代表这10个槽位标签,并不代表微博内容id。另外mblog字段下还有个id属性,估计和mblogid一样的效果,有兴趣可以试试。
在比较用于正向最大匹配分词的速度方面,DAT分词平均速度为936kB/s5(2006年),项目用到github上一日本人的python版的DAT,其查询速度可以达到 2.755M/s,查询速度和分词速度基本是差不多的,这三倍的差距应该是做了优化的。
判断信息是租房还是求房也是根据关键字,当信息中出现[“求租”, “想租”,”求到”,”求从”, “要租”, “寻租”,”寻找”, “找新房子”, “找房子”, “找房”, “寻房”, “求房”, “想找”, “希望房”]信息就标注为求房,否则标注为租房。
需要统一转化,使用DataUtil类处理。其中mongodb使用的是ISO时间,比北京时间早8小时,而pymongo中的datetime.datetime 数据并不会按时区处理,因此手动减少8小时后存储。同样从mongoDB中取出的时间要转化为当地时间。
> d=new Date()
> d
> d.toLocaleDateString()
Cassandra HBase和MongoDb性能比较此文详细比较了三种主流Nosql数据库,最终项目选择Mongodb,就在于MongoDB适合做读写分离场景中的读取场景,并且其用js开发的,对json插入支持特别好。什么时候mongodb是较坏的选择呢,参考WHY MONGODB IS A BAD CHOICE FOR STORING OUR SCRAPED DATA
cd weiboSA
scrapy crawl mblogSpider
可选参数: > scrapy crawl mblogSpider -a num= -a new_url=
➜ weiboZ git:(master) ✗ scrapy crawl mblogSpider -a num=10 -a new_url="http://m.weibo.cn/page/pageJson\?containerid\=\&containerid\=100103type%3D1%26q%3D%E6%B5%A6%E4%B8%9C%E7%A7%9F%E6%88%BF\&type\=all\&queryVal\=%E6%B5%A6%E4%B8%9C%E7%A7%9F%E6%88%BF\&luicode\=10000011\&lfid\=100103type%3D%26q%3D%E4%B8%8A%E6%B5%B7%E6%97%A0%E4%B8%AD%E4%BB%8B%E7%A7%9F%E6%88%BF\&title\=%E6%B5%A6%E4%B8%9C%E7%A7%9F%E6%88%BF\&v_p\=11\&ext\=\&fid\=100103type%3D1%26q%3D%E6%B5%A6%E4%B8%9C%E7%A7%9F%E6%88%BF\&uicode\=10000011\&next_cursor\=\&page\="
2016-10-29 14:41:11 [root] WARNING: 生成MongoPipeline对象
2016-10-29 14:41:11 [root] WARNING: 开始spider
2016-10-29 14:41:11 [root] WARNING: 允许插入数据的时间大于2016-10-29 14:15:05.875000
2016-10-29 14:41:13 [root] WARNING: do page1.
2016-10-29 14:41:13 [root] WARNING: do other pages.
2016-10-29 14:41:13 [root] ERROR: 编号为:E91f233Ds的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:Ef4ri5bC6的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:Ef3UNqMmV的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:Ef3stkA8a的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:Ef3pzmJ6i的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:Ef1OBtvQr的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:Ef03Lj54z的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:EeYLU2GQd的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:EeYlBv7bn的数据插入异常
2016-10-29 14:41:13 [root] ERROR: 编号为:EeXkop2vu的数据插入异常
2016-10-29 14:41:15 [root] WARNING: 结束spider
created_at:{$gt:new Date('2016-10-20T00:00:00')},
"text" : "房子在大上海国际花园,漕宝路1555弄,距9号线合川路地铁站步行5分钟,距徐家汇站只有4站,现在转租大床,有独立卫生间,公共厨房,房租2400,平摊下来1200,有一女室友,室友宜家上班,限女生,没有物业费,包网络,水电自理@上海租房无中介 @上海租房无中介 @上海租房 @上海租房无中介联盟",
"scheme" : "http://m.weibo.cn/1641537045/EetVm3WBV?",
"created_at" : ISODate("2016-10-25T09:18:00Z")
"text" : "#上海租房##上海出租#9号线松江泗泾地铁站金地自在城,12层,步行、公交或小区班车直达地铁站。精装,品牌家具家电,主卧1800RMB/月;公寓门禁出入,房东直租,电话:13816835869,或QQ:36804408。@上海租房 @互助租房 @房天下上海租房 @上海租房无中介 @应届毕业生上海租房",
"scheme" : "http://m.weibo.cn/1641537045/Een8cAoy8?",
"created_at" : ISODate("2016-10-24T16:00:00Z")
"text" : "#上海租房# 个人离开上海:转租地铁9号线朝南主卧带大阳台,离地铁站两分钟!设备齐全,交通方便,随时入住。具体信息看图片~@上海租房 @上海租房无中介联盟 @魔都租房 帮转谢谢!",
"scheme" : "http://m.weibo.cn/1641537045/EdRpfuKuH?",
"created_at" : ISODate("2016-10-21T07:14:00Z")
"text" : "9号线桂林路 离地铁站8分钟 招女生室友哦 @上海租房 @上海租房无中介联盟 上海·南京西路",
"scheme" : "http://m.weibo.cn/1641537045/EdJ2U8Kv3?",
"created_at" : ISODate("2016-10-20T09:57:00Z")
and redux
together in one componentreact-redux-provide
. Demonstrates truly universal rendering with replication and queries.provide-router
.position: sticky
modification for React applicationslinter
to add hot reloading to React.createClass
and all classes with a render
method.A curated list of awesome iOS frameworks, libraries, tutorials, Xcode plugins, components and much more.
The list is divided into categories such as Frameworks, Components, Testing and others, open source projects, free and paid services. There is no pre-established order of items in each category, the order is for contribution. If you want to contribute, please read the guide.
Projects in Swift will be marked with :large_orange_diamond:, Swift Extensions will be marked with [e] and for Apple Watch projects. Feel free to add your project.
Awesome-iOS is an amazing list for people who need a certain feature on their app, so the best ways to use are:
in development and NSLog
in production. Support colourful and formatted output. :large_orange_diamond:Also see push notifications
Most of these are paid services, some have free tiers.
for the sophisticated hacker set@IBDesignable
iOS controls, which have useful @IBInspectable
properties (border width and color, corner radius and much more) :large_orange_diamond:NSDate
, NSCalendar
, and NSDateComponents
. :large_orange_diamond:NSDate
, NSCalendar
, NSDateComponents
, NSDateFormatter
) management :large_orange_diamond:LibYAML
.⌘ +
/ ⌘ -
.Other amazingly awesome lists can be found in the
Distributed under the MIT license. See LICENSE for more information.
This post is specific to general JavaScript programming. For those who looking for React.JS, Angular 2.0, Python, Machine Learning, CSS, Swift… Visit the publication.
Step-by-step tutorial to build a modern JavaScript stack from scratch [5750 stars on Github] Courtesy of Jonathan Verrecchia
5 things you can do with Yarn: Yarn is a new package manager for JavaScript by Facebook. Learn how to use Yarn to increase your productivity. Courtesy ofProsper Otemuyiwa and Auth0
Practical ES6: A practical dive into ES6 and maintainable JavaScript modules [832 stars on Github]
Overview of JavaScript ES6 features (a.k.a ECMAScript 6 and ES2015+). Courtesy of Adrian Mejia
What I learned from writing six functions that all did the same thing. Courtesy of Jackson Bates and Free Code Camp
How to make a compiler with JavaScript. Courtesy of Mariko Kosaka
ES6 For Everyone: The Best Way To Learn Modern JavaScript.Courtesy of Wes Bos
80 JavaScript Interview Questions and Answers
[663 stars on Github]
Did you see the successfully launch of a really cheap ARM board for $9 only – the C.H.I.P. computer? It has an ARMv7 CPU with 512 MByte of main memory, 4 GByte flash memory as disk storage and is equipped with onboard WiFi and bluetooth as well.
With these awesome features built-in it would be really a great device to run Docker containers if only the recent Linux kernel 4.4 has the correct modules included, but it doesn’t – what a bummer!
But with spending a lot of time in building a custom Linux kernel and tweaking & testing I was finally able to install the latest Docker Engine for ARM on the C.H.I.P. — and as a result you can easily follow this tutorial and within a few minutes only you can run your first Docker container on this cute ARM board…
Preparing your operating system and your Linux kernel to be able to run the Docker Engine efficiently can be a hard thing and can consume a lot of labor time.
Fortunately in this tutorial I’ll show you the basic steps to get Docker running on the $9 C.H.I.P. computer, so every normal user should be able to do it on her own within a short time only – even without the need being an expert in this area. And if you’re in a hurry you can skip most of the tutorial and go straight ahead to theLessons learned - TL;DR
section and install Docker with just two single commands.
Use a Chrome browser and flash the latest firmware and OS on your C.H.I.P. computer. For detailed instructions go to the appropriate web site at http://flash.getchip.com/.
To run Docker on the C.H.I.P. we’re using the OS image for Debian Headless 4.4
, which is a server installation without any GUI and thus it’s quite smaller is size, so we do have more space left for running apps and Docker containers.
Pro Tip: You can even see all the detailed log messages while flashing via an UART console cable:
Starting download of 6291508 bytes
downloading of 6291508 bytes finished
Flashing sparse image on partition UBI at offset 0x26800000 (ID: 10)
start 0x9a00 blkcnt 0x180 partition 0x400 size 0x7fc00
Writing at offset 0x26800000
New offset 0x27400000
........ wrote 384 blocks to 'UBI'
*****************[ FLASHING DONE ]*****************
Once the C.H.I.P. is successfully flashed you can connect it directly with an USB cable to a Mac or Linux machine. The C.H.I.P. is getting power over the USB cable and connects via an USB serial console driver, so you can easily connect to.
Let’s see if we can find the booted C.H.I.P. on the USB wire:
ls -al /dev/cu.usb*
crw-rw-rw- 1 root wheel 20, 159 Sep 3 16:52 /dev/cu.usbmodem141113
Note 1: you have to wait a few minutes until the device can be detected as the C.H.I.P. has to be fully booted.
Note 2: it’s strongly recommended to use a powered USB hub, otherwise you’ll hit some power problems and the C.H.I.P. can’t access or can immediately shuts off
Now we can connect to the ARM device via the screen
sudo screen /dev/cu.usbmodem141113
Alternatively, and this is my preferred way, you can attach an UART console cable (e.g. from AdaFruit) which is typically shown as a device on the Mac like /dev/cu.usbserial
. With this setup you can even watch the complete boot logs of the C.H.I.P. computer and you are able to see all early boot messages from U-Boot and from loading and starting the Linux kernel. This gives you all details in case there are any problems and issues with a homegrown kernel.
sudo screen /dev/cu.usbserial 115200
Once you get to the login message, you can use username root
and password chip
to login:
Debian GNU/Linux 8 chip ttyS0
chip login: root
Linux chip 4.4.11-ntc #1 SMP Sat May 28 00:27:07 UTC 2016 armv7l
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Following the instruction here http://docs.getchip.com/chip.html#wifi-connection you can list all the available WiFi networks and then connect the C.H.I.P. to your preferred network.
nmcli device wifi list
HITRON-FEE0 Infra 11 54 Mbit/s 67 ▂▄▆_ WPA2
WLAN-R46VFR Infra 1 54 Mbit/s 65 ▂▄▆_ WPA2
My ASUS Infra 6 54 Mbit/s 64 ▂▄▆_ WPA2
WLAN-718297 Infra 1 54 Mbit/s 59 ▂▄▆_ WPA2
WLAN-MCQYPS Infra 1 54 Mbit/s 30 ▂___ WPA2
Telekom_FON Infra 1 54 Mbit/s 27 ▂___ --
Connect to the WiFi station with the SSID mySSID
and password myPASSWORD
, please insert you own SSID and PASSWORD. In this example I’m using the SSID WLAN-R46VFR
nmcli device wifi connect 'WLAN-R46VFR' password '**********' ifname wlan0
Once you are connected you can see the ‘*’ in front of your connected WiFi network:
nmcli device wifi list
HITRON-FEE0 Infra 11 54 Mbit/s 67 ▂▄▆_ WPA2
My ASUS Infra 6 54 Mbit/s 64 ▂▄▆_ WPA2
WLAN-718297 Infra 1 54 Mbit/s 59 ▂▄▆_ WPA2
WLAN-MCQYPS Infra 1 54 Mbit/s 30 ▂___ WPA2
Telekom_FON Infra 1 54 Mbit/s 27 ▂___ --
* WLAN-R46VFR Infra 1 54 Mbit/s 100 ▂▄▆█ WPA2
And the C.H.I.P. should have got an IP address from the DHCP server:
ifconfig wlan0
wlan0 Link encap:Ethernet HWaddr cc:79:cf:20:6d:d8
inet addr: Bcast: Mask:
inet6 addr: fe80::ce79:cfff:fe20:6dd8/64 Scope:Link
inet6 addr: 2003:86:8c18:1a37:ce79:cfff:fe20:6dd8/64 Scope:Global
RX packets:119 errors:0 dropped:1 overruns:0 frame:0
TX packets:102 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:24656 (24.0 KiB) TX bytes:16973 (16.5 KiB)
Now we’re connected to the network and can access the internet and the C.H.I.P. can be reached from our Mac or Linux machine.
Here we have to use the same username root
and password chip
to login via SSH:
ssh-keygen -R
ssh-copy-id root@
Finally we can login to the C.H.I.P. computer via SSH:
ssh root@
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Jan 1 00:32:25 1970
-bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
As a first step we’d like to check the current Linux kernel version and operating system.
Kernel version:
uname -a
Linux chip 4.4.11-ntc #1 SMP Sat May 28 00:27:07 UTC 2016 armv7l GNU/Linux
Operating system:
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION="8 (jessie)"
BUILD_ID=Wed Jun 1 05:34:36 UTC 2016
VARIANT="Debian on C.H.I.P"
In order to install Docker I’ve prepared a complete installation script which can be downloaded and executed in a single command line. I’ve you’re interested into the details you should check the script at GitHub.
# install Docker
curl -sSL https://github.com/DieterReuter/arm-docker-fixes/raw/master/002-fix-install-docker-on-chip-computer/apply-fix-002.sh | bash
At the end of running the install script we’ll see some errors occurred and the start of the Docker Engine has failed.
Errors were encountered while processing:
E: Sub-process /usr/bin/dpkg returned an error code (1)
This is OK for now as it just indicates the default Linux kernel isn’t able to run Docker on the C.H.I.P. and we have to build and install a custom Linux kernel which has all the necessary kernel settings for Docker enabled.
If you’re interested in analyzing these errors in more detail you can run the command systemctl status docker.service
and you’ll get more detailed log messages from systemd
root@chip:~# systemctl status docker.service -l
● docker.service - Docker Application Container Engine
Loaded: loaded (/etc/systemd/system/docker.service; enabled)
Active: failed (Result: exit-code) since Sat 2016-09-03 13:20:49 UTC; 2min 23s ago
Docs: https://docs.docker.com
Main PID: 10840 (code=exited, status=1/FAILURE)
Sep 03 13:20:48 chip dockerd[10840]: time="2016-09-03T13:20:48.580271961Z" level=info msg="libcontainerd: new containerd process, pid: 10848"
Sep 03 13:20:49 chip dockerd[10840]: time="2016-09-03T13:20:49.652832502Z" level=error msg="'overlay' not found as a supported filesystem on this host. Please ensure kernel is new enough and has overlay support loaded."
Sep 03 13:20:49 chip dockerd[10840]: time="2016-09-03T13:20:49.656854332Z" level=fatal msg="Error starting daemon: error initializing graphdriver: driver not supported"
Sep 03 13:20:49 chip systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Sep 03 13:20:49 chip systemd[1]: Failed to start Docker Application Container Engine.
Sep 03 13:20:49 chip systemd[1]: Unit docker.service entered failed state.
Sep 03 13:20:50 chip systemd[1]: [/etc/systemd/system/docker.service:24] Unknown lvalue 'Delegate' in section 'Service'
Sep 03 13:20:52 chip systemd[1]: [/etc/systemd/system/docker.service:24] Unknown lvalue 'Delegate' in section 'Service'
Sep 03 13:20:53 chip systemd[1]: [/etc/systemd/system/docker.service:24] Unknown lvalue 'Delegate' in section 'Service'
Sep 03 13:20:54 chip systemd[1]: [/etc/systemd/system/docker.service:24] Unknown lvalue 'Delegate' in section 'Service'
In order to keep this tutorial short and easy to follow, I’d like to use an already prepared custom kernel which has nearly all the possible kernel modules and settings enabled to run the Docker Engine in an optimized way on the C.H.I.P. computer.
Therefore we only have to install our new Linux kernel and have to reboot the system to activate it.
# install custom Linux Kernel and reboot
curl -sSL https://github.com/hypriot/binary-downloads/releases/download/chip-kernel-4.4.11/4.4.11-hypriotos.tar.bz2 | tar xvfj - -C /
After rebooting we’re going to check the kernel version again:
uname -a
Linux chip 4.4.11-hypriotos #1 SMP Mon Aug 29 19:18:49 UTC 2016 armv7l GNU/Linux
Check the Docker client version:
docker -v
Docker version 1.12.1, build 23cf638
Check the Docker server version:
docker version
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:31:15 2016
OS/Arch: linux/arm
Version: 1.12.1
API version: 1.24
Go version: go1.6.3
Git commit: 23cf638
Built: Thu Aug 18 05:31:15 2016
OS/Arch: linux/arm
Getting the detailed informations about the Docker Engine:
docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 1.12.1
Storage Driver: overlay
Backing Filesystem: <unknown>
Logging Driver: json-file
Cgroup Driver: cgroupfs
Volume: local
Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.11-hypriotos
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: armv7l
CPUs: 1
Total Memory: 491 MiB
Name: chip
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
Finally we could see, the latest Docker Engine v1.12.1 is now installed and is successfully running.
As a last step we’d like to start a first Docker container, a small web server.
docker run -d -p 80:80 hypriot/rpi-busybox-httpd
Unable to find image 'hypriot/rpi-busybox-httpd:latest' locally
latest: Pulling from hypriot/rpi-busybox-httpd
c74a9c6a645f: Pull complete
6f1938f6d8ae: Pull complete
e1347d4747a6: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:c00342f952d97628bf5dda457d3b409c37df687c859df82b9424f61264f54cd1
Status: Downloaded newer image for hypriot/rpi-busybox-httpd:latest
Now start your web browser and point it to the website from our Docker container.
Additional tip:
After installing some packages via apt-get
it’s a good idea to clean the APT cache from time to time and save disk space.
root@chip:~# apt-get clean
root@chip:~# df -h
Filesystem Size Used Avail Use% Mounted on
ubi0:rootfs 3.7G 373M 3.3G 11% /
devtmpfs 213M 0 213M 0% /dev
tmpfs 246M 0 246M 0% /dev/shm
tmpfs 246M 6.7M 239M 3% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 246M 0 246M 0% /sys/fs/cgroup
tmpfs 50M 0 50M 0% /run/user/0
Currently the C.H.I.P. isn’t able to run Docker out-of-the-box, but it just needs to install a custom built Linux kernel to prepare this awesome ARM board to run Docker easily. And now we’re able to install the officially built Docker Engine from the Docker project’s APT repository.
these are the only commands you need to install Docker
# install Docker
curl -sSL https://github.com/DieterReuter/arm-docker-fixes/raw/master/002-fix-install-docker-on-chip-computer/apply-fix-002.sh | bash
# install custom Linux Kernel and reboot
curl -sSL https://github.com/hypriot/binary-downloads/releases/download/chip-kernel-4.4.11/4.4.11-hypriotos.tar.bz2 | tar xvfj - -C /
And the best thing is, according to this tweet, the developers at @NextThingCo have already started to include all the required kernel settings into the standard OS images. So we can expect that the Docker Engine can be installed in the future even without tweaking the Linux kernel.
As I told you at the beginning of this tutorial, these are just the basic steps for a normal user to install and use Docker on the C.H.I.P. computer. But if you’re interested in all the technical details behind the scene, how to check and analyze your Linux kernel and how to optimize it for running Docker efficiently, then please drop me a comment or tweet me and I’ll write even more about all the technical details so you can follow the path along on to an expert level too. With these skills you then should be able to install Docker on any Linux-based ARM device.
As always use the comments below to give us feedback and share it on Twitter or Facebook.
Please send us your feedback on our Gitter channel or tweet your thoughts and ideas on this project at@HypriotTweets.
家政(阿姨)住家24小时照护目前仍是主流观念和选择,但需要注意的是,从事养老照料的家政人员典型画像是45-55岁、女性、文化低,这个行业在过去的10年平均工资涨幅仅为200-250%(即从1500-1800元/月到3500-4000元/月),是所有家政工种中涨幅最低的,也是强度最大最辛苦的。考虑到该群体学习能力有限,也正在面临返乡照顾老小及自身养老的问题,因此未来的5-10年一定会面临一线城市的养老看护阿姨荒(月嫂/育儿嫂 会相反呈井喷),这将直接影响到线下存量和创业公司的供给能力。
第二个问题是,单纯的半天陪诊,并不能带来对于老人身体情况的改善:老年人的慢病需要长期跟踪和综合治疗,最常见的情况是在换季/症状更迭等节点,花1-2个半天去三甲医院确诊病症,然后每周/双周去社区医院取药(医保报销比例更高 + 距离更近不用排队)。考虑到分级诊疗中社区首诊制度试点、医保用药品类逐步与三甲医院统一、社区医院与周边三甲医院的医联体和转诊机制等,虽然还需要时间逐步推进,但老人看病的省时省心程度会越来越高。
从京东搜索结果来看,相比品类繁多的儿童定位手表/手环类产品,老人定位的产品数量仅为儿童的29%。目前较知名的是360手环,作为北京市老龄委防走失的政府采购供应商,从2016年7月起为10000名北京市户籍60周岁+ 有记忆障碍、认知障碍或已确诊老年痴呆的失智老年人登记,并在30天内在社区属地免费发放手环。
虽然以房养老并不能在短期内增加老人收入,但随之产生的问题中,我们可以看到第二点健康/慢病管理 和 第五点围绕老人需求的垂直服务,是养老行业的创业机会。我们认为能够让老人和家庭少花钱也能享受到好的服务,这本身也是以节流的方式提升老人可支配收入。
图1:服务是基础 硬件是辅助
在本文的后续部分将会重点分析“广场舞和家政服务”,这证明老人之间的互助行为已经可以作为优质服务的一类供给形态,老人群体天然的相互理解,能够让相关技能的分享效率更高更具价值。而在丹麦等发达国家,本身也有老人与年轻人混合居住的社区,年轻人只要达成每周固定时长的陪伴/照料服务,就可以减免房租;在美国著名的众筹网站Kickstarter上,筹款10W+美元的“Present Perfect”,也是将幼儿园的孩子定期与养老院的老人聚在一起交流沟通,产生了非常好的互助效果。
对于“住”——Airbnb 在2015年7月的数据显示,全球有10%的房东超过60岁,其56%已退休,49%依靠固定退休收入生活。在经济收入上,老年房东群体平均接待房客约60天/年,平均收入约 6000 美元/年,这些收入足以支持老人购买各种必需品或外出旅行;在社会认同感上,老年房东有74%是独居或只与一位同伴居住,来往的客人带来了新鲜的世界,老人们更加用心的提供服务,获得了高于其他房东群体7.5%的五星好评。
在“行”上——滴滴出行 在2016年4月的《移动出行与司机就业报告》中提到,60后的专快车司机占整体的9%、顺风车4%、代驾1%,2016年11月1日网约车新规实施,我们也将关注其带来的影响;Uber美国在2015年1月,已有25%的Uber司机年龄超过50岁,3%本身就是退休人群,此外美国退休协会机构Life Reimagined和美国退休人士协会(AARP)都与Uber持续合作,为平台累计招募近1000名老司机。
观点2:移动互联网渗透率 + 共享经济发展程度,中国还有很长的路
图3 基于大妈社群的强信任 达人分销将带来相关领域的收益增量
我们来看参与广场舞的人群诉求是什么?健康(锻炼身体)+ 快乐(排解寂寞)+ 成就(获得认同)。那么广场舞App在这几点提供的价值是:更高效的学习优质编舞、更高效的认识靠谱舞友、更高效的进行广泛传播。
而在Care.com的Senior Care类目上,平均薪资是13.75美元/小时(相当于93元/小时),平台的服务供给共3855人,抽样1000人,50+占比24%。该人群中约40%具备10-20年经验,约80%+是寻找兼职,普遍收入是在20-25美元/小时。70岁+老人共5位,很惊讶于70岁的老人还在照顾80-90岁的长者,也有人从26岁开始至今连续40年从事老人相关职业。Ta们在职业资质、实践经验、行业深度上都有着很强的积累,并保有持续的热情,以及对老人照料/护理发自内心的喜爱。
图4 年轻老人照顾长者 寻找分时服务需求
北京时间2016年6月30日, Google Capital首次对上市公司进行投资——Care.com获得4635万美元融资后,Google正式成为Care.com的第一大股东。
疯狂老师则主打挑选前30%的优秀老师,北京总量共150人,其中50+(准)退休老师为2.7%(共4人)。按周授课时长曲线看,老师平均授课3.3小时/周,按客单价400-500元/小时,则每周的收入为1320-1650元,月收入增量5280-6600元,这笔收入对于老教师来说相对价值比较高。相对来说,疯狂老师的客单价高是由于线下属性更强(线下小班 1老师对2-6学生),对于老师来说可能会面临时间和地点的双重协调,相对来说参与度会受到影响。
Silicon Labs, the leader in energy-friendly solutions for a smarter, more connected world, has been constantly making silicon, software and tools to help engineers transform industries and improve lives since 1996.
Silicon Labs has just launched its newest development platform, The Thunderboard Sense Kit. Thunderboard Sense is a small and feature packed development platform for battery operated IoT applications. It is partnered with a mobile app that seamlessly connects Thunderboard Sense to a real time cloud database.
The mobile app enables a quick proof of concept of cloud connected sensors. The multi-protocol radio combined with a broad selection of on-board sensors, make the Thunderboard Sense an excellent platform to develop and prototype a wide range of battery powered IoT applications.
The 30 mm x 45 mm board includes these energy-friendly
Onboard sensors measure data and transmit it wirelessly to the cloud. Thunderboard Sense comes with Silicon Labs’ ready-to-use cloud-connected IoT mobile apps, to collect and view real-time sensor data for cloud-based analytics and business intelligence.
“We’ve designed Thunderboard Sense to inspire developers to create innovative, end-to-end IoT solutions from sensor nodes to the cloud,” said Raman Sharma, Director of Silicon Labs’ IoT Developer Experience. “Thunderboard Sense helps developers make sense of everything in the IoT. They can move quickly from proof of concept to end product and develop a wide range of wireless sensing applications that leverage best-in-class cloud analytics software and business intelligence platforms.”
Check out the official intro video by Raman Sharma
To start using Thunderboard Sense you have to place your CR2032 battery in the right polarity, install the mobile app from Google Play or Apple store, find your board listed on the main screen of the app, and then you will be ready to explore the Thunderboard demos and start your own project! You can program Thunderboard Sense using the USB Micro-B cable and onboard J-Link debugger. You do not need RF design expertise to develop wireless sensor node applications.
Thunderboard Sense kit is available for $36 and you can buy it from here. All hardware, software and design files will be open and accessible for developers. You can visit Silicon Labs Github to download Thunderboard mobile app and cloud software source code.
Progressive Web Apps with React.js [Part I]: Introduction. Courtesy of Addy Osmani at Google
……………………………………[Part II]
Redux Step by Step: A Simple and Robust Workflow for Real Life Apps. Courtesy of Tal Kol and Hackernoon
Implementation Notes: A detailed advanced level tutorial on how React really works. Courtesy of Dan Abramov at Facebook
ARc: A progressive React starter kit based on the Atomic Design methodology. Courtesy of Diego Haz and Brad Frost
Styled Components: Visual primitives for React and React Native. Use the best bits of ES6 and CSS to style your apps [747 stars on Github]
N1: An extensible desktop mail app built with React.
Redux VCR: Record and replay user sessions in real time with React
[390 stars on Github]
React for Beginners. Courtesy ofWes Bos
