INDEX
Explanations
words related to problematic situations or conflicts
references to trouble and its potential consequences
New Auto-Interp
Negative Logits
oliberal
-0.88
ulture
-0.75
uine
-0.74
hetics
-0.72
ithe
-0.70
ivist
-0.68
orney
-0.68
entric
-0.67
audi
-0.67
ITAL
-0.67
POSITIVE LOGITS
hooting
1.05
troubles
0.90
makers
0.85
trouble
0.81
maker
0.77
fully
0.75
some
0.73
interfering
0.70
flake
0.69
making
0.69
Activations Density 0.018%