INDEX
Explanations
proper nouns that seem to be related to political or social contexts
prominent people, organizations, and geopolitical references
New Auto-Interp
Negative Logits
Crescent
-0.50
OURCE
-0.49
Curious
-0.48
ULL
-0.48
Comments
-0.48
unden
-0.48
luster
-0.47
++++++++
-0.47
.#
-0.47
+.
-0.46
POSITIVE LOGITS
didnt
0.85
hadn
0.84
forgot
0.81
could
0.80
doesnt
0.78
had
0.78
cannot
0.76
deserved
0.75
should
0.75
lacks
0.74
Activations Density 0.585%