INDEX
Explanations
words related to seriousness or severity
mentions of serious topics or issues
New Auto-Interp
Negative Logits
enaries
-0.91
ifully
-0.82
seamlessly
-0.81
wright
-0.78
rious
-0.74
anic
-0.73
sylv
-0.72
anol
-0.71
ilant
-0.71
atu
-0.70
POSITIVE LOGITS
consideration
1.03
contender
0.95
doubts
0.94
contenders
0.91
repercussions
0.89
injury
0.88
threat
0.87
jeopardy
0.86
consequences
0.85
threats
0.85
Activations Density 0.065%