df = pd.read_json("[%s]" % ",".join(open('file_name').readlines()))
lab notebook
суббота, 10 октября 2015 г.
One-liner to read list of JSON objects from text file to Pandas dataframe
пятница, 18 сентября 2015 г.
четверг, 17 сентября 2015 г.
Twisted.python.failure.Failure after Scrapy fetch()
I encountered this error when I tried to fetch() scrapy.Request containing form data in scrapy shell:
With the following output:
r = Request(url=url, body=urllib.urlencode({'formparam1':'value1'}), dont_filter=True)
fetch(r)
With the following output:
2015-09-17 14:48:03 [scrapy] DEBUG: Retrying <GET http://www.website.com/form.aspx> (failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-09-17 14:48:03 [scrapy] DEBUG: Retrying <GET http://www.website.com/form.aspx> (failed 2 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-09-17 14:48:03 [scrapy] DEBUG: Gave up retrying <GET http://www.website.com/form.aspx> (failed 3 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
Instead, I should be using scrapy.FormRequest which is able to handle form data natively.
r = FormRequest(url=url, formdata=payload, dont_filter=True)
fetch(r)
2015-09-17 14:53:07 [scrapy] DEBUG: Redirecting (302) to <GET http://www.website.com/form.aspx> from <POST http://www.website.com/form.aspx>
2015-09-17 14:53:08 [scrapy] DEBUG: Crawled (200) <GET http://www.website.com/form.aspx> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f343c6d1b10>
[s] item {}
[s] r <POST hhttp://www.website.com/form.aspx>
[s] request <POST http://www.website.com/form.aspx>
[s] response <200 http://www.website.com/form.aspx>
[s] settings <scrapy.settings.Settings object at 0x7f343c6d1a90>
[s] spider <Spider 'spider' at 0x7f343ae50f50>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
среда, 10 сентября 2014 г.
Tilde key on Mac Air with Ubuntu
How to get backtick (`) and tilde (~) symbols in Ubuntu installed on Macbook Air with EU keyboard (instead of backslash (\) and pipe (|) symbols that show up by default):
1. Run
xev
and press the tilde key. Find the keycode assosiated with this key in the output.
2. Change or create file
~/.xmodmaprc
and add the following text to it:
keycode <keycode from xev output> = grave asciitilde
3. Run:
xmodmap ~/.xmodmaprc
The method is taken from here http://stackoverflow.com/questions/17757232/switch-tab-and-backtick-keys-ubuntu-linuxсреда, 22 января 2014 г.
How to read XML file into pandas dataframe using lxml
This is probably not the most effective way, but it's convenient and simple.
Let's pretend that we're analyzing the file with the content listed below:
First, we need import lxml objectify
Then, open the file:
Get the root node:
Now we can access child nodes, and with
Now we obviously want to convert this data into data frame.
Les's import pandas:
Prepare a empty data frame that will hold our data:
Now we go though our XML file appending data to this dataframe:
(name of the Series object serves as an index element while appending the object to DataFrame)
And here is out fresh dataframe:
Full source code:
Let's pretend that we're analyzing the file with the content listed below:
<xml_root>
<object>
<id>1</id>
<name>First</name>
</object>
<object>
<id>2</id>
<name>Second</name>
</object>
<object>
<id>3</id>
<name>Third</name>
</object>
<object>
<id>4</id>
<name>Fourth</name>
</object>
</xml_root>
First, we need import lxml objectify
from lxml import objectify
Then, open the file:
path = 'file_path'
xml = objectify.parse(open(path))
Get the root node:
root = xml.getroot()
Now we can access child nodes, and with
root.getchildren()[0].getchildren()we're able to get the actual content of the first child node as a simple Python list:
[1, 'First']
Now we obviously want to convert this data into data frame.
Les's import pandas:
import pandas as pd
Prepare a empty data frame that will hold our data:
df = pd.DataFrame(columns=('id', 'name'))
Now we go though our XML file appending data to this dataframe:
for i in range(0,4): obj = root.getchildren()[i].getchildren() row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text])) row_s = pd.Series(row) row_s.name = i df = df.append(row_s)
(name of the Series object serves as an index element while appending the object to DataFrame)
And here is out fresh dataframe:
id name 0 1 First 1 2 Second 2 3 Third 3 4 Fourth
Full source code:
from lxml import objectify import pandas as pd path = 'file_path' xml = objectify.parse(open(path)) root = xml.getroot() root.getchildren()[0].getchildren() df = pd.DataFrame(columns=('id', 'name')) for i in range(0,4): obj = root.getchildren()[i].getchildren() row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text])) row_s = pd.Series(row) row_s.name = i df = df.append(row_s)
понедельник, 9 декабря 2013 г.
Things to do after installing fresh Ubuntu on a laptop
(serves mostly as a reminder for myself, someday I'll probably turn in into a script; features updated regularly; applicable to 12.04 LTS)
1. Install power saving tweakers:
http://askubuntu.com/questions/300953/how-can-i-improve-battery-life-on-my-laptop
2. Enable Hibernation:
http://askubuntu.com/questions/94754/how-to-enable-hibernation
3. Disable notification baloons
http://askubuntu.com/questions/13464/how-can-i-customize-disable-notification-bubbles
5. Set up SSH config
4. Install password manager
1. Install power saving tweakers:
http://askubuntu.com/questions/300953/how-can-i-improve-battery-life-on-my-laptop
2. Enable Hibernation:
http://askubuntu.com/questions/94754/how-to-enable-hibernation
3. Disable notification baloons
http://askubuntu.com/questions/13464/how-can-i-customize-disable-notification-bubbles
5. Set up SSH config
4. Install password manager
вторник, 19 ноября 2013 г.
Enable SSH private/public key authorization & setting up SSH config file on Linux (Ubuntu)
If you access your SSH remote server with login/password pair and bored by entering password every time you log in (there is no way to save your password in /.ssh/config), consider switching to private/public key authorization instead and setting up the config file, so you'll be able to login like that:
ssh yourhost
Without specifying your host parameters & password every time.
The setup consists from two simple steps: generating a keypair and setting up a config file for SSH.
Generating a keypair
First, you have to generate a key pair using ssh-keygen tool. Like this:
ssh-keygen -t dsa
You'll see the following output:
Generating public/private dsa key pair. Enter file in which to save the key (/home/yourname/.ssh/id_dsa): hit enter Enter passphrase (empty for no passphrase): again, hit enter Enter same passphrase again: hit enter one more time Your identification has been saved in /home/yourname/.ssh/id_dsa. Your public key has been saved in /home/yourname/.ssh/id_dsa.pub.
You can also use ssh-keygen -t rsa for better security.
Now you may notice that two new files apper in your ~/.ssh directory: id_dsa & id_dsa.pub
Now you have to copy id_dsa.pub to your remote host. We'll use scp for that
scp ~/.ssh/id_dsa.pub name@host:~/.ssh/
if you need to specify port, use -P option
scp -P <port number goes here> ~/.ssh/id_dsa.pub name@host:~/.ssh/
It will ask for your SSH password, type it and hit enter.
Okay, now your pub key is copied to the remote host.
Now, SSH to the host (it's the last time you going to do that, I promise!):
ssh name@host
Change the name of id_dsa.pub to authorized_keys
mv ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
Change permissions of the file & folder:
chmod 755 ~/.ssh && chmod 755 ~/.ssh/authorized_keys
Setting up the SSH config file
This step is much simpler.Just create the ssh config directory and then the config file with your host parameters:
mkdir ~/.ssh cd ~/.ssh
touch config
nano config
Then type your remote host details:
Host host1 HostName host1.example.com User yourname IdentityFile "~/.ssh/id_dsa"
Save file by pressing Ctrl + X, then hitting Enter
host1 is now the alias for your remote host host1.example.com
To ssh to your host, simply type:
ssh host1
In the terminal, and you will be logged in. Much better now.
Подписаться на:
Сообщения (Atom)