INDEX
Explanations
phrases related to significance or magnitude, specifically the word "big."
New Auto-Interp
Negative Logits
341
-0.15
ncy
-0.14
urence
-0.14
prus
-0.14
ynet
-0.14
oden
-0.14
iland
-0.14
каÑĤ
-0.14
گاÙĩ
-0.14
ãģ¡ãĤĥ
-0.13
POSITIVE LOGITS
oted
0.23
elow
0.18
ölçüde
0.17
/big
0.16
avir
0.16
-big
0.16
ging
0.16
à¸ģว
0.15
tuá»ķi
0.15
gie
0.15
Activations Density 0.041%