INDEX
Explanations
expressions of confusion or uncertainty related to new experiences or learning
New Auto-Interp
Negative Logits
atori
-0.15
à¸ģรรม
-0.15
aked
-0.14
ẩm
-0.14
auty
-0.14
idlo
-0.14
span
-0.14
XT
-0.13
isci
-0.13
135
-0.13
POSITIVE LOGITS
IPA
0.17
uggage
0.16
>NN
0.16
celik
0.15
Slave
0.14
$LANG
0.14
idas
0.14
ä¹ł
0.14
rias
0.14
_NR
0.14
Activations Density 0.121%