INDEX
Explanations
prompts requesting interaction or action from the reader
requests or prompts for action
New Auto-Interp
Negative Logits
cler
-0.76
pires
-0.71
arc
-0.69
Huntington
-0.69
visor
-0.68
law
-0.64
tin
-0.64
lings
-0.63
imposed
-0.63
borgh
-0.63
POSITIVE LOGITS
advise
0.90
note
0.88
forgive
0.84
enable
0.83
sir
0.82
beware
0.82
dont
0.80
Subscribe
0.80
fill
0.78
pardon
0.78
Activations Density 0.016%