INDEX
Explanations
phrases relating to expectations and outcomes
New Auto-Interp
Negative Logits
Ñıгом
-0.15
ÐļÑĢÑĸм
-0.12
одаÑĢ
-0.12
ï¼Į以åıĬ
-0.12
жÑĥ
-0.12
rire
-0.11
uming
-0.11
Ķëĭ¤
-0.11
ãģŁãĤĬ
-0.11
loha
-0.11
POSITIVE LOGITS
but
1.15
but
0.93
nhưng
0.84
BUT
0.77
ä½Ĩ
0.74
но
0.74
_but
0.73
pero
0.71
But
0.71
ï¼Įä½Ĩ
0.71
Activations Density 4.197%