INDEX
Explanations
phrases related to negative outcomes or consequences
instances of the word "fire" and its variations in various contexts
New Auto-Interp
Negative Logits
meric
-0.85
anson
-0.81
sembly
-0.75
sie
-0.74
Redmond
-0.74
omo
-0.72
Citizen
-0.71
eston
-0.68
VIDIA
-0.67
Gutenberg
-0.65
POSITIVE LOGITS
flies
1.13
fly
0.97
lda
0.81
fighter
0.80
proof
0.78
storm
0.77
ricanes
0.75
hotter
0.74
extingu
0.73
fighters
0.72
Activations Density 0.016%