INDEX
Explanations
phrases related to unexpected negative consequences or outcomes
terms related to failures or adverse outcomes, particularly involving "fire" as a metaphor
New Auto-Interp
Negative Logits
member
-0.70
agar
-0.69
said
-0.68
hi
-0.68
anson
-0.65
uala
-0.64
sal
-0.63
nih
-0.63
uv
-0.63
wra
-0.62
POSITIVE LOGITS
tactics
0.68
aneously
0.68
iveness
0.66
Healer
0.66
Moreno
0.65
å§«
0.65
ANCE
0.65
icult
0.63
ļéĨĴ
0.63
acies
0.62
Activations Density 0.097%