INDEX
Explanations
mentions of fire departments
mentions of specific organizations and individuals
New Auto-Interp
Negative Logits
puff
-0.94
gang
-0.67
fortune
-0.65
cules
-0.65
alore
-0.64
Corpus
-0.64
balls
-0.63
riot
-0.56
esome
-0.56
fuck
-0.56
POSITIVE LOGITS
oug
0.83
obin
0.81
icate
0.80
iamond
0.78
ob
0.78
orf
0.76
uct
0.76
ijk
0.75
axter
0.75
olphin
0.74
Activations Density 0.047%