INDEX
Explanations
words related to tolerance and acceptance
New Auto-Interp
Negative Logits
ikal
-0.16
oria
-0.16
elim
-0.15
ɵ
-0.15
hausen
-0.15
inen
-0.15
zug
-0.15
yll
-0.15
elin
-0.14
oman
-0.14
POSITIVE LOGITS
ampo
0.18
tol
0.17
452
0.16
/mit
0.14
à¥įह
0.14
ulet
0.14
och
0.14
ëį°ìĿ´íĬ¸
0.14
ftime
0.14
ruba
0.14
Activations Density 0.015%