INDEX
Explanations
and/or followed by auxiliary
New Auto-Interp
Negative Logits
ون
0.57
ه
0.57
g
0.56
ன்
0.55
v
0.52
d
0.50
a
0.50
Kabhi
0.49
ა
0.49
დი
0.48
POSITIVE LOGITS
ы
0.68
U
0.55
ó
0.54
it
0.52
to
0.52
?
0.52
are
0.51
esports
0.50
َ
0.50
y
0.49
Activations Density 0.574%