forked from scrapy/scrapy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsep-008.trac
102 lines (74 loc) · 3.11 KB
/
sep-008.trac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
= SEP-008 - Item Loaders =
[[PageOutline(2-5,Contents)]]
||'''SEP:'''||8||
||'''Title:'''||Item Parsers||
||'''Author:'''||Pablo Hoffman||
||'''Created:'''||2009-08-11||
||'''Status'''||Final (implemented with variations)||
||'''Obsoletes'''||[wiki:SEP-001], [wiki:SEP-002], [wiki:SEP-003], [wiki:SEP-005]||
== Introduction ==
Item Parser is the final API proposed to implement Item Builders/Loader proposed in [wiki:SEP-001].
'''NOTE:''' This is the API that was finally implemented with the name "Item Loaders", instead of "Item Parsers" along with some other minor fine tuning to the API methods and semantics.
== Dataflow ==
1. !ItemParser.add_value()
1. '''input_parser'''
2. store
2. !ItemParser.add_xpath() ''(only available in XPathItemLoader)''
1. selector.extract()
2. '''input_parser'''
3. store
3. !ItemParser.populate_item() ''(ex. get_item)''
1. '''output_parser'''
2. assign field
== Modules and classes ==
* scrapy.contrib.itemparser.!ItemParser
* scrapy.contrib.itemparser.XPathItemParser
* scrapy.contrib.itemparser.parsers.!MapConcat ''(ex. !TreeExpander)''
* scrapy.contrib.itemparser.parsers.!TakeFirst
* scrapy.contrib.itemparser.parsers.Join
* scrapy.contrib.itemparser.parsers.Identity
== Public API ==
* !ItemParser.add_value()
* !ItemParser.replace_value()
* !ItemParser.populate_item() ''(returns item populated)''
* !ItemParser.get_collected_values() ''(note the 's' in values)''
* !ItemParser.parse_field()
* !ItemParser.get_input_parser()
* !ItemParser.get_output_parser()
* !ItemParser.context
* !ItemParser.default_item_class
* !ItemParser.default_input_parser
* !ItemParser.default_output_parser
* !ItemParser.''field''_in
* !ItemParser.''field''_out
== Alternative Public API Proposal ==
* !ItemLoader.add_value()
* !ItemLoader.replace_value()
* !ItemLoader.load_item() ''(returns loaded item)''
* !ItemLoader.get_stored_values() or !ItemLoader.get_values() ''(returns the !ItemLoader values)''
* !ItemLoader.get_output_value()
* !ItemLoader.get_input_processor() or !ItemLoader.get_in_processor() ''(short version)''
* !ItemLoader.get_output_processor() or !ItemLoader.get_out_processor() ''(short version)''
* !ItemLoader.context
* !ItemLoader.default_item_class
* !ItemLoader.default_input_processor or !ItemLoader.default_in_processor ''(short version)''
* !ItemLoader.default_output_processor or !ItemLoader.default_out_processor ''(short version)''
* !ItemLoader.''field''_in
* !ItemLoader.''field''_out
== Usage example: declaring Item Parsers ==
{{{
#!python
from scrapy.contrib.itemparser import XPathItemParser, parsers
class ProductParser(XPathItemParser):
name_in = parsers.MapConcat(removetags, filterx)
price_in = parsers.MapConcat(...)
price_out = parsers.TakeFirst()
}}}
== Usage example: declaring parsers in Fields ==
{{{
#!python
class Product(Item):
name = Field(output_parser=parsers.Join(), ...)
price = Field(output_parser=parsers.TakeFirst(), ...)
description = Field(input_parser=parsers.MapConcat(removetags))
}}}