INDEX
Explanations
references to educational and administrative titles or positions
New Auto-Interp
Negative Logits
ún
-0.20
ná»Ńa
-0.15
-0.15
üçük
-0.14
oris
-0.14
ób
-0.14
deniz
-0.14
éĨ´
-0.14
emple
-0.14
ingly
-0.14
POSITIVE LOGITS
ate
0.16
Emer
0.15
himself
0.15
ëĭĺ
0.15
reek
0.14
call
0.14
rup
0.14
smouth
0.14
ship
0.14
yyn
0.14
Activations Density 0.119%