INDEX
Explanations
references to Sikhism and specific figures associated with it
New Auto-Interp
Negative Logits
iture
-0.15
Bryant
-0.15
inger
-0.14
ures
-0.14
eling
-0.14
uer
-0.14
arrant
-0.13
iao
-0.13
uing
-0.13
pline
-0.13
POSITIVE LOGITS
vala
0.19
afka
0.17
orns
0.16
idian
0.15
idata
0.15
blogs
0.14
bottoms
0.14
chor
0.14
re
0.14
omore
0.14
Activations Density 0.019%