INDEX
Explanations
mentions of serious matters or issues
references to serious issues or concerns
New Auto-Interp
Negative Logits
enaries
-0.82
wright
-0.74
ride
-0.73
sylv
-0.72
AW
-0.70
ifully
-0.69
seamlessly
-0.69
weet
-0.67
urated
-0.67
av
-0.65
POSITIVE LOGITS
consideration
1.02
lly
0.88
threat
0.87
contender
0.86
injury
0.85
threats
0.84
allegation
0.84
contenders
0.83
trouble
0.82
danger
0.80
Activations Density 0.038%