INDEX
Explanations
creating new content or narratives
New Auto-Interp
Negative Logits
on
1.05
ول
1.03
с
1.02
ור
0.96
ра
0.96
to
0.94
as
0.94
ร
0.90
ри
0.89
ے
0.88
POSITIVE LOGITS
in
1.77
u
1.70
t
1.62
ar
1.41
z
1.34
b
1.33
is
1.28
k
1.21
ad
1.11
w
1.09
Activations Density 0.085%