使用scala分析英文文章并找出不认识的单词

背景:当看一些英文文章时,想提前知道有哪些单词是自己不认识的。

1.放入英文文章

2.进行比对,比对自己已经会的。并找出不会的

3.使用python发送get/post请求把不会的单词,查出意思,发送到slack里。

 

用来存放新文章

inputpath=/home/gavin/desktop/scala_code/dwh/resource/wordCount/unknow

这里放已经认识的单词,存放格式是文件,可以放任意多个文件,为了看起来方便,我是放的用日期命名的文件sampledata=/home/gavin/desktop/scala_code/dwh/resource/wordCount

提交任务

spark-submit --class scala.com.scpman.WordCount target/scala-2.11/dwh-assembly-1.0.jar --inputpath=/home/gavin/desktop/scala_code/dwh/resource/wordCount/unknow --sampledata=/home/gavin/desktop/scala_code/dwh/resource/wordCount/learndone > a.txt

$ cat a.txt
inputpath = /home/gavin/desktop/scala_code/dwh/resource/wordCount/unknow
sampledata = /home/gavin/desktop/scala_code/dwh/resource/wordCount/learndone
-----------------------------
You already Learn 400 words!
Set(fyi, content, at, my, latest, all, document, cloudfront, sameorigin, content-length, library, it, very, smart, order, task, using, keeping, text, good, another, from , gonna, while, age, connection, preload, processing, payload, time, trying, doesn't, responsive, in, card, log, these, into, access-control-allow-origin, aliceng, live, due, block, solid, human, server, referrer, loopback, shown, where, draw, deniselau, forced, as, user’s, cms, placed, out, on, words, question, fact, warn, me, is, already, a, having, be, bergen, failed, was, just, length, doesnt, eth0, spark_local_ip, move, referrer-policy, deadline, changed, asdf, recent, displaying, tiny, view, main, fetches, animation, how, svg-based, operation, svgs, vector-based, asap, techniques, remove, use, performance, keep-alive, colour, several, html, transport, want, backend, who, manage, experience, features, remember, factory, increase, h, really, gx_otxmsyjy0wfu5psunzbxwhdt5x-l5irtj9v1lhzq2nzns63v7gw==, point, put, .previous, scaled, happiness, previously, post, background, thu, support, between, classes, loads, frontend, jumping, it’s, explained, extra, tenforget, route, f12119dc59597a3cbedac2ac64405829.cloudfront.net, subdomain, instead, natan, drawing, image, learn, for, to, look, 1.1, hello, also, app, remembers, areas, gift, bitmap, lived, .wo.ni, effect, set, know, i, so, before, initial, some, loading, display, better, matter, progressive, their, rather, need, of, confluence, limit, x-cache, scale, useful, lots, only, changes, placeholder, sprint14, here, successfully, hi, since, product, cart, include, native-hadoop, then, fix, are, images, deployment, create, find, technique, people, strict-transport-security, after, kbs, paid, profile, version, pdp, an, bad, utils, any, you, take, wed, didn't, browser, imagine, gmt, text/html, other, little, layout, no, from, once, have, dec, we, prevents, delete, today, inlined, do, can, based, check, logic, that, firefox, result, content-type, item, description, them, around, or, calculations, webp, settings, well, space, could, keep, tap, word, load, x-slack-backend, slack, empty, world, don’t, cost, does, developing, nativecodeloader, area, issue, placeholders, miss, creating, x-amz-cf-id, making, picture, goal, security, vector, via, artifacts, part, edges, idea, hacks, up, choosing, work, cases, resolves, generated, export/import, when, generate, maintenance, this, shapes, havebut, re, cod, address, import, dont, wow, usually, overhead, regions, code, might, admin, sync, export, fill, bind, sth, more, interface, written, facts, please, new, one, most, preview, what, end, max-age=31536000, by, we'd, ., colours, getting, called, should, if, experiments, don't, builtin-java, localhost, posts, platform.., message, didn’t, decided, date, but, many, automatically, screen, every, accept-encoding, pixels, test, full, previous, there, the, apache, x-frame-options, added, about, turns, max, will, includesubdomains, options, our, ideal, which, final, eddy.ck, both, not, no-referrer, examples, leave, help, unable, change, loaded, user, thanks, old, those, hostname, dominant, your, let’s, render, leaving, design, future, small, come, and, size, get, medium, vary, request, payment, with, try)
-----------------------------
Please learn those words:
-----------------------------
silhouette
traction
blurred
innovative
vectorise
transition
variations
cument
vectorising
appreciating
strategies
accurate
animate
dimensions
blur-up
vibrant
blurry
manageable
recognizing
gradients
realised
smoother
candidate
applicable
-----------------------------
if you already know those words, please move it into /home/gavin/desktop/scala_code/dwh/resource/wordCount/learndone
-----------------------------

python getWords.py
—–silhouette:—n. 轮廓,剪影 (人的)体形 (事物的)形状 vt. 使呈现影子 使呈现轮廓

然后打开slack:
-----vibrant:---adj. 振动的 响亮的 充满生气的
-----blurry:---adj. 模糊的 污脏的,污斑的
-----recognizing:---v. 认出( recognize的现在分词 ) 承认[认清](某事物) 赏识 承认…有效[属实]
-----gradients:---n. 道路的斜度( gradient的名词复数 ) 坡度 变化程度 (温度、压力等的)变化率
realised---Translation failure
smoother---Translation failure
-----candidate:---n. 报考者 申请求职者 攻读学位者 最后命运或结局如何已显然可见者

 

然而,这些单词还是不想去记。。

打赏

发表评论

电子邮件地址不会被公开。 必填项已用*标注