INDEX
Explanations
references to fires
mentions of fire-related incidents or terms
New Auto-Interp
Negative Logits
xus
-0.83
Virtue
-0.80
Vide
-0.80
Freed
-0.75
Tok
-0.71
laus
-0.70
amaru
-0.69
atem
-0.68
afort
-0.67
Birth
-0.65
POSITIVE LOGITS
flies
1.13
storm
1.06
exting
1.05
fighting
1.00
proof
0.96
storms
0.95
fight
0.93
trap
0.92
bomb
0.92
balls
0.90
Activations Density 0.022%