一半君的总结纸

听话只听一半君

如何为女王大人自动下载日系杂志 – 乱搞中

女王大人希望在ipad上用comicGlass看with或者mina杂志,为了方便自动的下杂志,有以下设想

  1. 应该有个pyqt的界面?来显示想下载哪些杂志,以及当前已经cache了哪些杂志
  2. 应该有个service或者cron能自动poll杂志下载网站,得到最新的杂志列表
  3. 最好能在ui里直接和jdownloader或者pyLoad通讯,显示当前下载队列等信息(这条感觉超出lz能力范围之外了

目测需要的科技和需要解决的问题… :

  • python web scraping module 比如scrapy (也许太材小用了…)  / BeautifulSoup / mechanize 或者在简单的情况下可以自己regex一下 / webscraping
  • 使用普通方法做pyqt的ui还是用新科技 QML ( Qt Quick ) ?
  • 如何控制Synology NAS上的pyLoad,并且显示下载状态信息等,还是说用台式机上的jdownloader CLI 用公司机器加个cron ?
  • 用config来保存设置
  • 用sqlite来保存已cache的数据(为了日后搞ios app玩做准备)
  • 如果要搞成service/daemon,在linux下也许要用perp,windows下要用pywin32

设计图:

jap mag dl

研究进展:

2013/10/20:

  • 在synology上安装pyLoad并测试其是否能下载杂志首先发现这里有高手已经编译好了的pyLoad,以前似乎是要自己ssh进NAS,然后装ipkg等一大堆复杂的步骤的首先在windows 7上装了个ssh client来遥控我的NAS安装后发现默认用8000端口,和自带的download station一样,造成无法从webUI访问,只好ssh进NAS,再次运行安装程序,其实改config文件就行,但是我不知道在哪里,所以
    /var/packages/pyload/scripts/start-stop-status setup
    

    其中有一步可以设置port,之后重启daemon就可以登入了,重设过程中有提示到config文件在/volume1/@appstore/pyload/var/config

    随便加了个下载测试,发现还是需要手动输入CAPTCHA的…
    pyload

  • 测试看看lz有没有能力”遥控”pyLoad,阅读API文档

2013/10/22:

  • 测试如何通过pyLoad HTML JSON API添加下载,已成功
    # 首先要登录
    payload={'username':'myUsername','password':'myPassword'}
    r = requests.post('pyload-core/api/login', data=payload)
    
    # 试验 #1
    payload={'name':'xxx mag','links':['http://depositfiles.com/files/6u8gr6yx5']}
    payloadJSON = {k: json.dumps(v) for k, v in payload.items()}
    paloadJSON['session']='31241191be328c656d562c1d90f640ad'
    
    rs = requests.post('pyload-core/api/addPackage', data=payloadJSON )
    
    # 试验 #2
    payload={'username':'myUserName','password':'myPassword'}
    r = requests.post('pyload-core/api/login', data=payload)
    
    payload={'name':'xxx mag','links':['http://depositfiles.com/files/6u8gr6yx5']}
    payloadJSON = {k: json.dumps(v) for k, v in payload.items()}
    
    rs = requests.post('pyload-core/api/addPackage', data=payloadJSON ,cookies=r.cookies)
    
    # 试验 #3 最终版,这个貌似比较pythonic
    payload={'username':'myUsername','password':'myPassword'}
    with requests.session() as s:
        s.post('pyload-core/api/login', data=payload)
        payload={'name':'xxx mag','links':['http://depositfiles.com/files/6u8gr6yx5']}
        # python 2.6
        payloadJSON = dict(map(lambda (k,v): (k, json.dumps(v)), payload.iteritems()))
        # python 2.7
        # payloadJSON = {k: json.dumps(v) for k, v in payload.items()}
        r = s.post('pyload-core/api/addPackage', data=payloadJSON)
    

2013/10/23:

  • 测试webscraping
    from webscraping import download, xpath
    D = download.Download()
    # download and cache the Google Code webpage
    html = D.get('http://zasshiko.com/all.php')
    
    # xpath 测试
    # all # A B C D ...
    .//*[@id='nav-one']//li
    
    # all mag links
    .//ul[@id='nav-one']/li//li//a
    
    xpath.search(html,'//ul[@id="nav-one"]/li//li//a')
    ['25ans', 'ageha', 'anan', 'AneCan', ...]
    
    xpath.search(html,'//ul[@id="nav-one"]/li//li//a/@href')
    ['http://www.zasshiko.com/magazine_list.php?show_cat=27', ... ]
    
    

2013/10/25:

  • 测试BeautifulSoup
    import re
    import requests
    from bs4 import BeautifulSoup
    
    r = requests.get("http://zasshiko.com/all.php")
    soup = BeautifulSoup(r.content)
    for onmouseover in soup.find_all('a',onmouseover=re.compile('^Tip.*$')):
        magName = onmouseover.text
        soupMag = BeautifulSoup(onmouseover['onmouseover'])
        magLink = soupMag.find('a')['href']
        print magName,magLink
    

2013/10/27:

  • idle不能贴多行代码,所以下载了idlex

2013/11/3:

  • Codeacademy学习javascript和jQuery,为学用qml做ui做准备

待续 ..

Advertisements

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: