INDEX
Explanations
phrases related to harmful actions or situations
terms related to endangerment and harm to individuals or the public
New Auto-Interp
Negative Logits
negotiator
-0.68
metab
-0.66
rera
-0.64
grou
-0.63
Explorer
-0.62
Unch
-0.61
contender
-0.61
chamber
-0.59
delinquent
-0.59
backlog
-0.58
POSITIVE LOGITS
ings
0.98
ments
0.95
liest
0.91
ength
0.90
ences
0.85
ingly
0.79
ning
0.79
liness
0.79
lihood
0.78
ãĥĨ
0.78
Activations Density 0.032%