由于现在网页技术越来越复杂,因此在采集数据的时候也会遇到很多困难。有的网页使用了新的技术,如果直接查看页面源码的话,不能找到里面的数据。遇到这种情况我们要怎么对数据进行采集呢!下面举例来说明下源码中找不到数据内容如何采集字段。举例地址:https://shopee.com.my/Casio-1314D-Couple-Men's-Women's-Analogue-Casual-Watch-with-Box-i.640103.1748719
首先源码中没有数据,就不同用普通的方法采集,如果会使用抓包工具的话,可以使用抓包工具来获取下含有数据的真实地址。抓包工具可以使用:Fiddler等。如果不会使用抓工具,可以试试浏览器自带的开发者工具,一般F12就可以调出来了。使用这些工具的具体方法可以参考网站上其它的文章,这里只是介绍怎么获取真实数据链接。
到这里级可以尝试找一下数据的真实地址了,一般地址都包含api,ajx等。获取到链接地址:https://shopee.com.my/api/v2/shop/get_hot_sales?limit=8&offset=0&shopid=640103后可以尝试直接打开地址,如果可以正常打开地址,就可以观察含有数据的地址和不含数据地址的共同之处,直接采集并生成含有数据的链接地址即可。例子中的数据地址有一个共同的参数shopid:640103。可以采集的时候直接采集shopid后生成含有数据的地址后,在对要采集的字段进行采集处理即可。获取含数据的地址后可以查看源码中的数据了,从数据中提取自己想要采集的字段即可。源码过长这里就不全贴出来了,贴出一部分供参考。
{ "version": "2034dbdea40d6bf3bbeecf4838e503d8", "data": { "items": [ { "itemid": 604360974, "price_max_before_discount": 9900000, "item_status": "normal", "can_use_wholesale": false, "show_free_shipping": true, "estimated_days": 2, "is_hot_sales": null, "is_slash_price_item": false, "upcoming_flash_sale": null, "slash_lowest_price": null, "condition": 1, "add_on_deal_info": null, "is_non_cc_installment_payment_eligible": false, "categories": null, "ctime": 1508420410, "name": "Casio V002L Series Men's/Women's/Couple Analog Leather Date with Box", "show_shopee_verified_label": true, "size_chart": null, "is_pre_order": false, "service_by_shopee_flag": 0, "historical_sold": 1819, "reference_item_id": "", "recommendation_info": null, "bundle_deal_info": null, "price_max": 6300000, "has_lowest_price_guarantee": false, "shipping_icon_type": null, "images": [ "b273f7c58ba695207ce81f074c9da82e", "e9dc3cecc200a1045568b7821a9a3985", "24622b03b5fa6f54d61cc085eea17749", "d5815e1b9d640e2018232aa9ddfa1f31", "f808c539b6913af4cf71b05f1d4f1936", "413a1301822e12ba753aa07d3f76b56f", "71b3f321b83e4918883b4b8b1ba5085b", "b8e82b0f77362606bc7b058c7739df90", "518ad15b391fed2189ae22459d37b2d3" ], "price_before_discount": 9900000, "cod_flag": null, "catid": 159, "is_official_shop": false, "coin_earn_label": null, "hashtag_list": null, "sold": 13, "makeup": null, "item_rating": { "rating_star": 4.880287, "rating_count": [ 1253, 7, 4, 17, 76, 1149 ], "rcount_with_image": 317, "rcount_with_context": 672 }, "show_official_shop_label_in_title": false, "discount": "36%", "reason": null, "label_ids": [ 1000075, 1000061 ] "bundle_deal_id": 0, "is_group_buy_item": null, "description": "100% original\\n\\n1 year warranty\\n\\nCase / bezel material: Resin\\n\\nResin Band\\n\\nResin Glass\\n\\nWater Resistant\\n\\nLED backlight Afterglow Dual time 1/100-second stopwatch Measuring capacity: 23:59'59.99'' Measuring modes: Elapsed time, split time, 1st-2nd place times\\n\\nMulti-function alarm (One-time or 7-time snooze)\\n\\nHourly time signal\\n\\nAuto-calendar (to year 2099) 12/24-hour format\\n\\nRegular timekeeping: Hour, minutes, seconds, pm, month, date, day Accuracy: ±30 seconds per month\\n\\nApprox. battery life: 10 years on CR2025\\n\\nSize of case : 44.1×40×11.5mm\\n\\nTotal weight : 27.3g\\n\\n#jammurah #casio #WATCH #digitalwatch #hadiahbirthday #CASIODIGITAL #DIGITALWATCH #F200W", "makeups": null, "welcome_package_type": 0, "show_official_shop_label_in_normal_position": null, "item_type": 0 } ] }, "error_msg": null, "error": 0 }
如果你还有其它疑问可以来本站搜索相关问题,这里会有你想要的答案:大海资源库
暂无评论内容