INDEX
Explanations
topics or situations that provoke strong reactions of anger or disapproval
New Auto-Interp
Negative Logits
wagen
-0.51
uchin
-0.50
ramer
-0.49
Solitaire
-0.48
counselor
-0.47
llan
-0.45
cleaner
-0.45
ession
-0.44
kered
-0.44
properties
-0.44
POSITIVE LOGITS
ously
0.79
iously
0.74
indignation
0.67
uproar
0.58
outcry
0.56
èĥ
0.53
outrage
0.53
loudly
0.52
ingly
0.50
storm
0.50
Activations Density 11.760%