INDEX
Explanations
phrases indicating urgency or imperative advice directed at the reader
New Auto-Interp
Negative Logits
inst
-0.15
braco
-0.14
nowrap
-0.14
COPE
-0.14
orge
-0.14
cope
-0.14
TemplateName
-0.14
ergus
-0.14
orical
-0.14
enor
-0.14
POSITIVE LOGITS
æ¯
0.15
ivé
0.14
kes
0.14
805
0.14
pais
0.13
arias
0.13
yth
0.13
584
0.13
eru
0.13
001
0.12
Activations Density 0.049%