INDEX
Explanations
religious texts or references, particularly from the Bible
New Auto-Interp
Negative Logits
abbo
-0.16
fsp
-0.15
yor
-0.15
alten
-0.15
.constraint
-0.14
ALSE
-0.14
еÑİ
-0.14
astr
-0.14
warm
-0.14
wer
-0.14
POSITIVE LOGITS
Duffy
0.17
inski
0.16
SAR
0.15
777
0.15
rh
0.14
faker
0.14
Sara
0.14
m
0.14
rian
0.14
insky
0.14
Activations Density 0.116%