INDEX
Explanations
phrases related to societal issues and personal experiences
New Auto-Interp
Negative Logits
overlook
-1.02
roundup
-0.94
inflic
-0.92
incompet
-0.89
drown
-0.88
arch
-0.88
qualified
-0.86
shelling
-0.85
adjud
-0.85
aven
-0.84
POSITIVE LOGITS
They
1.46
We
1.46
Our
1.36
Everything
1.34
It
1.34
Sometimes
1.34
Too
1.28
There
1.28
When
1.27
What
1.27
Activations Density 0.535%