INDEX
Explanations
mentions of serious issues, topics, or situations
New Auto-Interp
Negative Logits
enaries
-0.89
atu
-0.85
wright
-0.82
tein
-0.75
av
-0.72
sylv
-0.70
eez
-0.69
aston
-0.68
urated
-0.68
remember
-0.68
POSITIVE LOGITS
consideration
1.00
contender
0.91
lly
0.89
serious
0.83
danger
0.83
contenders
0.79
threat
0.78
injury
0.77
trouble
0.77
serious
0.77
Activations Density 0.029%