INDEX
Explanations
phrases starting with "Our"
instances of the word "Our."
New Auto-Interp
Negative Logits
CENT
-0.66
``(
-0.66
liest
-0.64
ambers
-0.64
LSD
-0.63
quote
-0.61
externalToEVAOnly
-0.61
–
-0.61
cum
-0.60
conom
-0.60
POSITIVE LOGITS
selves
1.44
own
0.99
¥ŀ
0.98
self
0.93
anmar
0.92
adversary
0.83
cyclopedia
0.83
adversaries
0.82
ourselves
0.78
fearless
0.77
Activations Density 0.049%