INDEX
Explanations
terms related to societal and political issues, particularly focusing on justice, reform, and systemic challenges
New Auto-Interp
Negative Logits
Brow
-0.58
Stab
-0.56
laugh
-0.55
cise
-0.55
hin
-0.54
Honour
-0.54
Saying
-0.53
Coral
-0.52
ullivan
-0.52
Thanksgiving
-0.52
POSITIVE LOGITS
exists
1.14
improves
1.07
dominates
1.03
tends
1.02
occurs
1.01
persists
1.00
requires
0.99
isn
0.98
involves
0.98
entails
0.98
Activations Density 0.176%