INDEX
Explanations
references to the word "couple" indicating quantity
New Auto-Interp
Negative Logits
ollapsed
-0.18
noc
-0.16
imas
-0.16
ibal
-0.14
à¤Ľ
-0.14
à¸Ńà¸Ļ
-0.14
-gnu
-0.13
ály
-0.13
rub
-0.13
eter
-0.13
POSITIVE LOGITS
dozen
0.22
legs
0.15
atr
0.14
UnderTest
0.14
of
0.14
ième
0.14
hundred
0.14
eki
0.14
erus
0.14
lotte
0.13
Activations Density 0.009%