INDEX
Explanations
square shapes, faces, panels
New Auto-Interp
Negative Logits
ت
0.73
т
0.68
It
0.65
ي
0.65
on
0.64
يّ
0.64
On
0.60
وفر
0.59
If
0.59
理解
0.58
POSITIVE LOGITS
as
0.84
f
0.70
c
0.70
và
0.70
۰
0.70
ast
0.69
და
0.68
and
0.66
và
0.66
im
0.65
Activations Density 0.001%