INDEX
Explanations
the word "thousand" and its variations
New Auto-Interp
Negative Logits
Cree
-0.16
ayas
-0.15
olic
-0.15
icon
-0.15
room
-0.14
ault
-0.14
ogle
-0.14
uman
-0.14
irk
-0.14
Torres
-0.13
POSITIVE LOGITS
fold
0.21
ieves
0.18
zew
0.17
-fold
0.16
awi
0.16
ittest
0.16
orman
0.16
ITO
0.16
th
0.15
naire
0.15
Activations Density 0.078%