INDEX
Explanations
phrases related to negative attributes or impacts
mentions of negative concepts or experiences
New Auto-Interp
Negative Logits
DOM
-0.87
conservancy
-0.81
ITNESS
-0.81
heet
-0.78
dropping
-0.77
hower
-0.77
plain
-0.77
pread
-0.77
abiding
-0.76
raltar
-0.76
POSITIVE LOGITS
reinforcement
1.05
spiral
0.90
Negative
0.89
gearing
0.86
impact
0.85
effects
0.85
consequence
0.84
feedback
0.82
consequences
0.82
publicity
0.82
Activations Density 0.023%