INDEX
Explanations
articles or determiners preceding nouns
New Auto-Interp
Negative Logits
arias
-0.17
aucoup
-0.16
ieten
-0.15
iets
-0.15
ARP
-0.15
ót
-0.15
olta
-0.15
(always
-0.14
erable
-0.14
byn
-0.14
POSITIVE LOGITS
vel
0.15
 
0.14
oret
0.14
_Util
0.14
553
0.14
Dek
0.14
565
0.14
Walsh
0.14
<strong
0.13
neutr
0.13
Activations Density 0.039%