INDEX
Explanations
mentions of evolutionary themes or concepts
references to the word "Evil" and its variants
New Auto-Interp
Negative Logits
Gord
-0.68
prescribed
-0.67
Sara
-0.66
Sharif
-0.66
ends
-0.65
ribbon
-0.65
gru
-0.65
Luxem
-0.64
brass
-0.64
XL
-0.64
POSITIVE LOGITS
Ev
3.88
EV
1.53
Iv
1.38
Ir
1.36
Ec
1.36
Dar
1.28
ev
1.25
Ay
1.23
Av
1.21
Factor
1.12
Activations Density 0.017%