INDEX
Explanations
gloves and related contexts
New Auto-Interp
Negative Logits
ed
0.71
ير
0.71
er
0.65
ার
0.64
гне
0.60
سال
0.59
,
0.59
ため
0.58
almac
0.58
,!
0.58
POSITIVE LOGITS
P
1.02
G
1.00
R
0.92
F
0.87
T
0.84
W
0.84
J
0.84
S
0.83
Y
0.83
D
0.82
Activations Density 0.002%