INDEX
Explanations
mentions of natural disasters, specifically wildfires
references to wildfires or fire incidents
New Auto-Interp
Negative Logits
Birth
-0.75
Tok
-0.72
Barcl
-0.66
Surgery
-0.65
WOR
-0.63
afort
-0.62
çİĭ
-0.61
GB
-0.61
conservative
-0.61
Students
-0.61
POSITIVE LOGITS
hooting
0.94
fires
0.92
fires
0.90
torches
0.88
paces
0.86
linger
0.85
Fired
0.84
flares
0.83
blazing
0.82
retard
0.81
Activations Density 0.007%