INDEX
Explanations
mentions of religious leaders, specifically rabbis
New Auto-Interp
Negative Logits
inati
-0.16
.LayoutStyle
-0.15
asu
-0.15
lining
-0.14
adx
-0.14
edis
-0.14
.mi
-0.14
_Tis
-0.14
elps
-0.14
DMIN
-0.14
POSITIVE LOGITS
binary
0.16
arend
0.15
binary
0.14
eson
0.14
sem
0.14
hiro
0.14
mk
0.14
universe
0.14
596
0.14
esk
0.14
Activations Density 0.008%