INDEX
Explanations
references to the word "them" in various contexts
New Auto-Interp
Negative Logits
themſelves
-0.96
Efq
-0.91
ſeveral
-0.90
himſelf
-0.88
reaſon
-0.88
AndEndTag
-0.87
Họ
-0.86
bershka
-0.85
cauſe
-0.85
ſtate
-0.85
POSITIVE LOGITS
تم
0.60
M
0.59
hm
0.59
↵↵
0.58
ム
0.57
bs
0.56
0.56
m
0.56
The
0.54
EV
0.54
Activations Density 0.046%