INDEX
Explanations
references to religious or spiritual themes and figures
New Auto-Interp
Negative Logits
DY
-0.15
-gnu
-0.15
åij½
-0.14
faut
-0.14
_tD
-0.14
mars
-0.14
lfw
-0.14
ÏĤ
-0.13
Rhino
-0.13
ichten
-0.13
POSITIVE LOGITS
nat
0.36
Bethlehem
0.33
Nat
0.33
Mary
0.29
stable
0.28
nat
0.28
Stable
0.27
Baby
0.27
Mary
0.26
stable
0.26
Activations Density 0.017%