INDEX
Explanations
references to utopian themes or concepts
New Auto-Interp
Negative Logits
SSION
-0.17
edi
-0.16
ofday
-0.16
haps
-0.16
haf
-0.15
hip
-0.15
hammer
-0.15
UMENT
-0.15
hawks
-0.15
ädchen
-0.15
POSITIVE LOGITS
opian
0.32
opia
0.29
most
0.28
ters
0.25
recht
0.23
tar
0.22
imately
0.22
MOST
0.22
lim
0.22
tering
0.22
Activations Density 0.013%