четверг, 17 сентября 2015 г.

Twisted.python.failure.Failure after Scrapy fetch()

I encountered this error when I tried to fetch() scrapy.Request containing form data in scrapy shell:


r = Request(url=url, 
            body=urllib.urlencode({'formparam1':'value1'}), 
            dont_filter=True)
fetch(r)

With the following output:
2015-09-17 14:48:03 [scrapy] DEBUG: Retrying <GET http://www.website.com/form.aspx> (failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-09-17 14:48:03 [scrapy] DEBUG: Retrying <GET http://www.website.com/form.aspx> (failed 2 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-09-17 14:48:03 [scrapy] DEBUG: Gave up retrying <GET http://www.website.com/form.aspx> (failed 3 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]

Instead, I should be using scrapy.FormRequest which is able to handle form data natively.

r = FormRequest(url=url, 
                formdata=payload, 
                dont_filter=True) 
fetch(r)

2015-09-17 14:53:07 [scrapy] DEBUG: Redirecting (302) to <GET http://www.website.com/form.aspx> from <POST http://www.website.com/form.aspx>
2015-09-17 14:53:08 [scrapy] DEBUG: Crawled (200) <GET http://www.website.com/form.aspx> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f343c6d1b10>
[s]   item       {}
[s]   r          <POST hhttp://www.website.com/form.aspx>
[s]   request    <POST http://www.website.com/form.aspx>
[s]   response   <200 http://www.website.com/form.aspx>
[s]   settings   <scrapy.settings.Settings object at 0x7f343c6d1a90>
[s]   spider     <Spider 'spider' at 0x7f343ae50f50>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser



среда, 10 сентября 2014 г.

Tilde key on Mac Air with Ubuntu




How to get backtick (`) and tilde (~) symbols in Ubuntu installed on Macbook Air with EU keyboard (instead of backslash (\) and pipe (|) symbols that show up by default):

1. Run xev and press the tilde key. Find the keycode assosiated with this key in the output.

2. Change or create file ~/.xmodmaprc and add the following text to it:

keycode <keycode from xev output> = grave asciitilde 

3. Run: xmodmap ~/.xmodmaprc

The method is taken from here http://stackoverflow.com/questions/17757232/switch-tab-and-backtick-keys-ubuntu-linux

среда, 22 января 2014 г.

How to read XML file into pandas dataframe using lxml

This is probably not the most effective way, but it's convenient and simple.

Let's pretend that we're analyzing the file with the content listed below:

<xml_root>

    <object>
        <id>1</id>
        <name>First</name>
    </object>

    <object>
        <id>2</id>
        <name>Second</name>
    </object>

    <object>
        <id>3</id>
        <name>Third</name>
    </object>

    <object>
        <id>4</id>
        <name>Fourth</name>
    </object>

</xml_root>

First, we need import lxml objectify

from lxml import objectify

Then, open the file:

path = 'file_path'
xml = objectify.parse(open(path))

Get the root node:

root = xml.getroot()

Now we can access child nodes, and with
root.getchildren()[0].getchildren()
we're able to get the actual content of the first child node as a simple Python list:

[1, 'First']

Now we obviously want to convert this data into data frame.

Les's import pandas:

import pandas as pd

Prepare a empty data frame that will hold our data:

df = pd.DataFrame(columns=('id', 'name'))

Now we go though our XML file appending data to this dataframe:

for i in range(0,4):
    obj = root.getchildren()[i].getchildren()
    row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text]))
    row_s = pd.Series(row)
    row_s.name = i
    df = df.append(row_s)

(name of the Series object serves as an index element while appending the object to DataFrame)

And here is out fresh dataframe:

  id    name
0  1   First
1  2  Second
2  3   Third
3  4  Fourth


Full source code:

from lxml import objectify
import pandas as pd

path = 'file_path'
xml = objectify.parse(open(path))
root = xml.getroot()
root.getchildren()[0].getchildren()
df = pd.DataFrame(columns=('id', 'name'))

for i in range(0,4):
    obj = root.getchildren()[i].getchildren()
    row = dict(zip(['id', 'name'], [obj[0].text, obj[1].text]))
    row_s = pd.Series(row)
    row_s.name = i
    df = df.append(row_s)

понедельник, 9 декабря 2013 г.

Things to do after installing fresh Ubuntu on a laptop

(serves mostly as a reminder for myself, someday I'll probably turn in into a script; features updated regularly; applicable to 12.04 LTS)

1. Install power saving tweakers:

http://askubuntu.com/questions/300953/how-can-i-improve-battery-life-on-my-laptop

2. Enable Hibernation:

http://askubuntu.com/questions/94754/how-to-enable-hibernation

3. Disable notification baloons

http://askubuntu.com/questions/13464/how-can-i-customize-disable-notification-bubbles

5. Set up SSH config

4. Install password manager

вторник, 19 ноября 2013 г.

Enable SSH private/public key authorization & setting up SSH config file on Linux (Ubuntu)


If you access your SSH remote server with login/password pair and bored by entering password every time you log in (there is no way to save your password in /.ssh/config), consider switching to private/public key authorization instead and setting up the config file, so you'll be able to login like that:

ssh yourhost

Without specifying your host parameters & password every time.

The setup consists from two simple steps: generating a keypair and setting up a config file for SSH.


Generating a keypair


First, you have to generate a key pair using ssh-keygen tool. Like this:
ssh-keygen -t dsa

You'll see the following output:
Generating public/private dsa key pair.

Enter file in which to save the key (/home/yourname/.ssh/id_dsa): hit enter

Enter passphrase (empty for no passphrase): again, hit enter

Enter same passphrase again: hit enter one more time

Your identification has been saved in /home/yourname/.ssh/id_dsa.

Your public key has been saved in /home/yourname/.ssh/id_dsa.pub.


You can also use ssh-keygen -t rsa for better security.
Now you may notice that two new files apper in your ~/.ssh directory: id_dsa & id_dsa.pub

Now you have to copy id_dsa.pub to your remote host. We'll use scp for that

scp ~/.ssh/id_dsa.pub name@host:~/.ssh/

if you need to specify port, use -P option

scp -P <port number goes here> ~/.ssh/id_dsa.pub name@host:~/.ssh/

It will ask for your SSH password, type it and hit enter.

Okay, now your pub key is copied to the remote host.

Now, SSH to the host (it's the last time you going to do that, I promise!):

ssh name@host

Change the name of id_dsa.pub to authorized_keys

mv ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys

Change permissions of the file & folder:

chmod 755 ~/.ssh && chmod 755 ~/.ssh/authorized_keys

Setting up the SSH config file

This step is much simpler.

Just create the ssh config directory  and then the config file with your host parameters:

mkdir ~/.ssh

cd ~/.ssh

touch config
nano config

Then type your remote host details:

Host host1
HostName host1.example.com
User yourname
IdentityFile "~/.ssh/id_dsa"

Save file by pressing Ctrl + X, then hitting Enter

host1 is now the alias for your remote host host1.example.com

To ssh to your host, simply type:

ssh host1

In the terminal, and you will be logged in. Much better now.