INDEX
Explanations
mentions of religious leaders, particularly rabbis
New Auto-Interp
Negative Logits
ighton
-0.17
zet
-0.15
ktion
-0.15
ance
-0.15
оÑģÑĤÑĥп
-0.15
ANDARD
-0.15
leton
-0.15
atural
-0.15
ori
-0.14
ilers
-0.14
POSITIVE LOGITS
rab
0.20
Rab
0.20
bin
0.20
rab
0.17
bits
0.17
bi
0.16
idity
0.16
rabbits
0.16
shake
0.16
BIT
0.16
Activations Density 0.004%