INDEX
Explanations
expressions of personal predicament or confusion
New Auto-Interp
Negative Logits
apult
-0.15
lectron
-0.15
jedn
-0.15
Ñģли
-0.14
INST
-0.14
illac
-0.14
imde
-0.14
iden
-0.14
elerik
-0.14
deen
-0.14
POSITIVE LOGITS
having
0.23
lost
0.21
Having
0.20
new
0.20
brand
0.20
sure
0.19
fairly
0.19
having
0.19
stuck
0.18
facing
0.18
Activations Density 0.061%