INDEX
Explanations
safety and security-related terms
terms related to security threats and illegal activities
New Auto-Interp
Negative Logits
sums
-0.56
CONTR
-0.56
Factors
-0.55
Balanced
-0.53
Donation
-0.53
Decision
-0.52
trust
-0.51
amen
-0.51
Piano
-0.51
gratitude
-0.51
POSITIVE LOGITS
abound
1.03
prolifer
0.98
rampant
0.98
everywhere
0.91
popping
0.91
bloom
0.89
lurking
0.84
prevalent
0.81
emerge
0.81
thrive
0.80
Activations Density 1.348%