INDEX
Explanations
references to religious figures and events
New Auto-Interp
Negative Logits
dra
-0.14
ató
-0.14
ÑĦÑĤ
-0.13
igi
-0.13
rig
-0.13
Sims
-0.13
baptized
-0.13
kop
-0.13
Patch
-0.13
Ming
-0.13
POSITIVE LOGITS
Fat
0.24
Mary
0.23
Virgin
0.23
Fat
0.21
Virgin
0.21
virgin
0.20
Mary
0.20
vir
0.20
vir
0.19
mary
0.18
Activations Density 0.064%