INDEX
Explanations
phrases related to criticism and potential negative consequences
phrases that highlight societal issues and their impacts
New Auto-Interp
Negative Logits
Outbreak
-0.72
Mayhem
-0.69
scenarios
-0.68
Problems
-0.65
detail
-0.64
Assignment
-0.64
Incident
-0.62
incidents
-0.62
[+
-0.62
emi
-0.62
POSITIVE LOGITS
cherished
1.29
bedrock
1.07
vital
1.03
pillars
1.02
cornerstone
1.00
livelihood
1.00
pillar
0.99
precious
0.99
essential
0.98
decency
0.97
Activations Density 0.514%