INDEX
Explanations
colloquial expressions and phrases related to subtle distinctions or critiques
New Auto-Interp
Negative Logits
Majefty
-0.65
ſtre
-0.62
EndContext
-0.62
anſ
-0.60
ClientSize
-0.59
ActivityResult
-0.59
ſta
-0.58
hemispheres
-0.58
fhort
-0.58
reaſon
-0.58
POSITIVE LOGITS
anti
0.90
liberal
0.69
ANTI
0.68
Anti
0.68
non
0.68
ANTI
0.64
:✨
0.64
partisan
0.62
pro
0.62
anti
0.61
Activations Density 0.525%