James Morris - JMOZ

Webmaster in '97.

Python: Convert Datetime to Timestamp and Back Again

| Comments

It’s pretty easy to get a datetime object from a timestamp in Python. However the same can not be said the other way around. Just when I start to think “oh Python’s pretty cool with a nice API” the further you look into the language and library you start to notice some cracks.

Convert a timestamp to a datetime object

1
2
3
4
5
from datetime import datetime

print datetime.fromtimestamp(1346236702)

#2012-08-29 11:38:22

Convert a datetime object to a timestamp

Why oh why did they not just implement a datetime.totimestamp() method on datetime?

1
2
3
4
5
6
7
8
from datetime import datetime
import time

dt = datetime.fromtimestamp(1346236702)

print time.mktime(dt.timetuple())

#1346236702.0

Improve Redis Performance With Pipelining

| Comments

I have been playing with Python and Redis recently, specifically redis-py which is available on github and can be easily installed with pip install redis.

With NoSQL (Redis) a common usage pattern is to save items, such as users or tweet type things as hashes, indexed in a set or list, then retrieve them by key individually but in a large looped batch. In SQL this is analogous to querying a table by id in a loop rather than one select statement with a WHERE clause. If you did this in SQL other developers would probably talk about how much of a noob you were over Skype; in Redis it’s ok because Redis is web-scale and optimised for this kinda stuff.

I’ve been wondering about the performance of Redis and redis-py’s pipelining feature which “can be used to dramatically increase the performance of groups of commands by reducing the number of back-and-forth TCP packets between the client and server”.

I created a script that simulates some typical Redis processing with and without pipelining and timed it using Python’s timeit.

Method

To test the performance of the Redis pipeline feature the following actions are executed non-pipelined and then pipelined.

  • Create 10000 user keys in a list.
  • Update 10000 user hashes with id, name and email fields.
  • Retrieve 10000 user hashes into a Python list.
  • Delete the user list and 10000 user hashes.

Each test is timed and executed one time but repeated 7 times then the minimum timing is reported. For some of the tests there is some setup code that is executed by timeit. I also throw in a timing on the cleanup() delete.

The pipe() function I have written decorates the test functions with the pipelining functionality by replacing the Redis r instance inside the test function with an instance of the pipeline then executing it.

The script

The results

(venv)james on moz-air in ~/code/jmoz.co.uk-flask (develop)
 $ py redis-pipe-test.py 
* working with 10000 items
* repeating timeit() 7 times with 1 iteration each time

create_users()
0.863823890686
pipe(create_users)
0.132288217545

update_users()
1.03583908081
pipe(update_users)
0.276007175446

retrieve_users()
0.896469116211
pipe(retrieve_users)
0.172962903976

cleanup()
0.829195022583

pipe(cleanup)
0.105372905731

You can see clearly that pipelining improves the performance of multiple calls to Redis - it’s roughly about 5 times faster when using pipeline() so start using it!

Python vs PHP

| Comments

I’m a PHP developer at heart and have been professionally since around 2007. I’ve dabbled in Ruby a fair few times but never stuck with it. I’m teaching myself Python at the moment as a fair few colleagues have coded in it and they all have good things to say about the language.

I know PHP fairly damn well - enough to know where to look for things which I don’t understand or know of, and I can figure out a new API or concept easily as I know the language. However with Python the concepts, syntax and semantics are somewhat different and I’m constantly finding myself in the situation where I want to do something simple, which I could do in 1 minute or less in PHP, but because I have no idea about Python I end up spending 5 times as long trying to figure it out.

I’ve decided to try and do a few posts on Python vs PHP where I’ll show how I’ve achieved the same outcome in both languages. As I’m learning the language there’s going to be some simple stuff here and I apologise if any of it is wrong or retarded, but if it helps someone who’s in the same position as me then I’m happy.

Here’s a list of Python vs PHP posts:

Also read a post about Learning Python where you can see some code comparisons between Python and PHP. There’s some discussion of interest on Hacker News.

Nginx Connect() Failed While Connecting to Upstream

| Comments

I’ve just spent a good 30 minutes debugging my PHP code and Nginx config trying to figure out why I was getting a HTTP/1.1 502 Bad Gateway error in my browser and the following in my logs:

2012/07/11 22:52:18 [error] 23021#0: *29 kevent() reported that connect() failed (61: Connection refused) while connecting to upstream, client: 127.0.0.1, server: dev.foo.co.uk, request: "GET /foo HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "dev.foo.co.uk"

Turns out it’s pretty simple and god damn OBVIOUS (after you’ve found out of course).

php-fpm isn’t running.

This came up because I restarted my mac the other day and so to resume development I needed a quick sudo nginx. I forgot about the sudo php-fpm.

My excuse is that my apps index page goes to a static html file and the rest to PHP so I was confused why only parts of it were working!

Mac Terminal Ssh Disconnecting

| Comments

I switched to a mac at the start of the year; prior to that I always used Linux and regularly ssh’d into remote boxes with hardly any problems and rarely any disconnects.

Since I’ve started using the mac for development and ssh’ing into remote boxes using iTerm, Terminal or TotalTerminal etc, I’ve noticed I keep having problems with ssh disconnecting, sometimes with the error message Write failed: Broken pipe.

To fix it just edit your ~/.ssh/config file and add the following options which will affect all hosts:

Host *
ServerAliveInterval 240
ServerAliveCountMax 3

The ServerAliveInterval tells ssh to send a keep alive packet every 240 seconds or 4 minutes. The ServerAliveCountMax tells ssh to do this a maximum of 3 times without getting a response, then to close the connection from the client side. This will prevent the annoying lockup where the terminal freezes for a minute or two then disconnects. Since I’ve made these changes my terminal has started to behave more like my good old Linux laptop.

Deploy a Silex App Using Git Push

| Comments

Up until a few days ago I used to use a small bash deployment script to deploy a few simple sites to my live box. The process was a git archive and extract, then an rsync to the live site. Only inspecting it recently I realised that rsync no longer sent just the changes but all of the files, I’d never noticed before as the sites were so small the deploy was over very quickly. The rsync used to work fine before as I would deploy my current working code where the timestamps on files would match the server. Since I started using git at home for dev, the git archive method timestamps the files with the latest commit’s timestamp. This messes up rsync.

My use case

I’ve been working on a very small Silex project which needs to be deployed to my live box so I decided to set up a new deployment method still using git but this time simplified into a git push à la heroku. I trawled through a lot of pages and posts about git deployment looking for the simplest and most effective method that would fit my use case which is as follows:

  • Store Silex app in git
  • Commit on develop branch and use master as the deployment branch
  • Push the code to my live box
  • Run an install script to setup Silex

The process

So for my use case of deploying a Silex application using git, here’s the technical breakdown:

  • Set up git to store the Silex app and install Silex using composer. Commit the composer.json and composer.lock files as these are used for the install. Use .gitignore to ignore the vendor/ directory and composer.phar.
  • Set up password-less ssh access to live server.
  • On the live server create a bare git repository where your changes will be pushed.
  • On the live server create a project directory for your code to be checked out to (this is the git working tree of your master branch checked out from the git bare repository from above).
  • Create a post-receive hook in the bare git repository which will checkout the tree to the project directory on a successful push then execute the Silex install script.
  • The install script will do the composer installation, installing Silex and dependencies to vendor/ if it’s not there or updating it if it is already.
  • Add the live server as a git remote.
  • git push to the live server to deploy the code and install the app.
  • Win.

.gitignore

This will prevent git from pushing the comparatively large Silex vendor/ directory to the live box, instead we rely on the composer.json and composer.lock files which will install Silex from the live box and make sure the versions are all correct.

vendor/
composer.phar

SSH password-less access

This simplifies ssh’ing to your live box and is generally a good thing to use. This is my ~/.ssh/config:

Host sagat
HostName sagat.foo.co.uk
User myuser
Port 22
IdentityFile ~/.ssh/my_key

Live server git setup

Here we create the bare git repository and then a directory for the tree to be checked out into:

$ mkdir /home/myuser/mysilexapp.git && cd !$
$ git init --bare
$ mkdir /home/myuser/mysilexapp

For a good explanation of git bare repositories go here.

Git post-receive hook

The post-receive hook is the most important part of this deployment method. The hook executes on a successful git push from a client machine and checks out the tree to our project directory then runs the install script to update Silex. Also of importance is the unset GIT_DIR. Without this Silex will not install correctly from your dev machines deploy as when the live box tries to update Silex using composer, composer and some of the dependencies rely on git and so get confused and break.

Make sure the post-receive hook is executable:

$ chmod u+x /home/myuser/mysilexapp.git/hooks/post-receive

The Silex install script

I like to store my installation commands for a project in a separate executable bash file:

The script needs to cd to it’s current location as when you execute the script programatically it will think it’s working from /home/myuser/.

Git remote for live server

$ git remote add live ssh://sagat/home/myuser/mysilexapp.git

I like to call my remotes by names I recognise like ‘live’ or ‘github’. Git traditionally uses ‘origin’ as the default remote name so if you named it origin a git push would automatically push to the origin server. If you name it otherwise you will have to do a git push live.

Win

So with everything setup, here is an example of my deployment process:

james on moz-air in ~/code/mysilexapp (master)
 $ echo foo > foo 
james on moz-air in ~/code/mysilexapp (master)
 $ git add !$
james on moz-air in ~/code/mysilexapp (master)
 $ git commit -m "foo"
[master df6c85c] foo
 1 file changed, 1 insertion(+), 6 deletions(-)
james on moz-air in ~/code/mysilexapp (master)
 $ git push live
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 257 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
total 572
drwxr-xr-x 5 web web   4096 2012-07-04 18:01 .
drwxr-xr-x 6 web web   4096 2012-07-03 18:57 ..
-rw-r--r-- 1 web web     57 2012-07-03 19:03 composer.json
-rw-r--r-- 1 web web   2657 2012-07-03 19:03 composer.lock
-rwxr-xr-x 1 web web 532630 2012-07-04 17:19 composer.phar
-rw-r--r-- 1 web web      4 2012-07-04 18:01 foo
-rw-r--r-- 1 web web     22 2012-07-04 17:19 .gitignore
-rwxr-xr-x 1 web web    101 2012-07-03 19:03 install.sh
drwxr-xr-x 3 web web   4096 2012-07-03 19:03 lib
drwxr-xr-x 6 web web   4096 2012-07-03 23:12 vendor
drwxr-xr-x 2 web web   4096 2012-07-03 19:03 web
#!/usr/bin/env php
All settings correct for using Composer
Downloading...

Composer successfully installed to: /home/myuser/mysilexapp/composer.phar
Use it: php composer.phar
Installing dependencies from lock file
Nothing to install or update
Generating autoload files
To ssh://sagat/home/myuser/mysilexapp.git
   ad5e1c1..df6c85c  master -> master

Pinterest API

| Comments

I’ve started working on my homepage jmoz.co.uk which I’m building in Python/Django, mainly for a learning exercise. One of the things I wanted was to get some of my content from Pinterest, unfortunately their API isn’t working or hasn’t been built (as of June 2012).

I built a small scraper which pulls me my pins and spits it out in json. I built it using Silex which is a small lightweight framework built off of the Symfony2 components.

I’ve deployed the scraper as Pinterest API, give it a go, see if it can provide you some of your Pinterest content to use on your app, make sure you cache it!

Parsing HTML With DOMDocument and DOMXpath::query

| Comments

The other day I needed to do some html scraping to trim out some repeated data stuck inside nested divs and produce a simplified array of said data. My first port of call was SimpleXML which I have used many times. However this time, the son of a bitch just wouldn’t work with me and kept on throwing up parsing errors. I lost my patience with it and decided to give DomDocument and DOMXpath a go which I’d heard of but never used.

All of the examples I could find showed basic parsing of html, e.g. grabbing all a elements on a page, or grabbing all the child book elements in a bookstore XML document. I needed to grab all the divs with a certain class then further parse their html to pull out bits of data. Here’s the html I’m working with for this example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<html>
    <body>
        <h1>Foo</h1>
        <div id="content">
            <div class="foo">
                <div><img class="fooimage" src="http://foo.com/bar.png" /></div>
                <p class="description">Foo bar</p>
            </div>
            <div class="foo">
                <div><img class="fooimage" src="http://foo.com/baz.png" /></div>
                <p class="description">Baz bat</p>
            </div>
        </div>
    </body>
</html>

Using DOMXPath::query to extract html data

I want an array of image src attribute values along with the value for the p elements. The basic method of doing this is to query for the div.foo elements using DOMXPath::query which will return a DOMNodeList. The list can then be iterated over which will produce a DOMNode every iteration. Once the DOMNode for the div.foo element has been obtained, we need to then query again using DOMXPath::query but crucially pass the node as the $contextnode 2nd parameter, this will make the query relative to the div.foo node and allow us to grab the child img element’s src attribute, and also the description from the p element.

Here’s the test script:

Which gives the desired result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Array
(
    [0] => Array
        (
            [image_src] => http://foo.com/bar.png
            [desc] => Foo bar
        )

    [1] => Array
        (
            [image_src] => http://foo.com/baz.png
            [desc] => Baz bat
        )

)

Symfony2 FOSUserBundle Role Entities

| Comments

I’ve been working with FOSUserBundle Roles recently, adding database persistence using a simple Doctrine array mapping. We needed a better implementation where the Roles could be managed from the database and dynamically added and removed to a User through an admin interface. I found quite a few posts on stackoverflow and the Symfony2 google group asking how best to implement Role entities but very few answers or solutions, and no documentation on the FOSUserBundle github page.

Symfony2’s security system is one of the most complex parts of the framework. Couple this with FOSUserBundle’s weird AOP implementation and you’ve basically got a big massive ball ache when it comes to figuring out what’s going on. After staring through a lot of code and not really getting anywhere, I figured I’d write some tests to cover the existing functionality then refactor the User and Role classes until everything went green.

Unit test to cover existing functionality

For reference here is the unit test I was working with.

Creating the Role entity

FOSUserBundle provides no Role entity. Symfony2 provides a Role object but some class members are private so we just implement the Role interface, hoping it will ensure the new Role implementation works correctly with the security system.

It’s a simple object, I’m implementing a __toString() method so we can loop in the template over User::getRoles() and echo the $role.

Creating the User, Role relationship

This is the User class with the Role relationship mapped. I tried to implement the same Role functionality as FOSUserBundle. You are restricted to certain method parameters due to the type hinting in the parent class, e.g. setRoles() must take an array. I found type hinting and return type expectations in some of the symfony2 security layer code, such as:

UsernamePasswordToken::__construct($user, $credentials, $providerKey, array $roles = array())

Because of this, I mixed up an ArrayCollection and array implementation. You can see I also provided the (set|get)RolesCollection() methods to make things easier when working with doctrine.

Modify the schema

Check the changes to the database:

$ php app/console --env=test doctrine:schema:update --em=user --dump-sql
CREATE TABLE security_roles (id INT AUTO_INCREMENT NOT NULL, role VARCHAR(70) NOT NULL, UNIQUE INDEX UNIQ_5A82CD6D57698A6A (role), PRIMARY KEY(id)) ENGINE = InnoDB;
CREATE TABLE security_users_roles (user_id INT NOT NULL, role_id INT NOT NULL, INDEX IDX_71E6DDEFA76ED395 (user_id), INDEX IDX_71E6DDEFD60322AC (role_id), PRIMARY KEY(user_id, role_id)) ENGINE = InnoDB;
ALTER TABLE security_users_roles ADD CONSTRAINT FK_71E6DDEFA76ED395 FOREIGN KEY (user_id) REFERENCES security_users(id) ON DELETE CASCADE;
ALTER TABLE security_users_roles ADD CONSTRAINT FK_71E6DDEFD60322AC FOREIGN KEY (role_id) REFERENCES security_roles(id) ON DELETE CASCADE;
ALTER TABLE security_users DROP roles

Either run this using the --force parameter or create a migration.

Summary

I managed to get all the tests to pass:

 $ phpunit --stop-on-error --stop-on-failure -c app src/JMOZ/Bundle/SecurityBundle/Tests/Functional/SecurityTest.php
PHPUnit 3.5.5 by Sebastian Bergmann.

...........

Time: 30 seconds, Memory: 82.25Mb

OK (11 tests, 35 assertions)

I would rather have not mixed the array/ArrayCollection implementation but was forced to do so due to the existing functionality and interfaces. I’d have liked to have seen some other implementations or solutions but could not seem to find any. If anyone has seen anything better please let me know. I hope this helps someone out there.