INDEX
Explanations
verbs indicating actions or behavior
negative statements about actions or experiences
New Auto-Interp
Negative Logits
accompan
-0.76
unknown
-0.74
accompanied
-0.66
abre
-0.66
unparalleled
-0.65
ãĤ¼ãĤ¦ãĤ¹
-0.65
moil
-0.64
pmwiki
-0.64
avoid
-0.64
andan
-0.63
POSITIVE LOGITS
anymore
1.69
anything
1.36
nor
1.26
any
1.21
anybody
1.17
anywhere
1.09
enough
1.08
ANY
1.05
slightest
1.00
anyone
0.99
Activations Density 0.240%