INDEX
Explanations
warning messages or alerts
warning statements related to potential dangers or sensitive content
New Auto-Interp
Negative Logits
aepernick
-0.85
Laughs
-0.78
Favorite
-0.76
ichick
-0.76
obbies
-0.76
Interview
-0.74
brates
-0.73
hement
-0.71
chens
-0.69
excuse
-0.69
POSITIVE LOGITS
dangers
1.45
impending
1.24
risks
1.18
danger
1.16
pitfalls
1.12
beware
1.06
dire
1.05
imminent
1.03
danger
0.96
consequences
0.95
Activations Density 0.268%