INDEX
Explanations
mentions of US states
references to individual states
New Auto-Interp
Negative Logits
Pastebin
-0.70
Rocket
-0.69
rious
-0.65
Lect
-0.65
sett
-0.62
Maw
-0.60
Notting
-0.59
Icar
-0.59
SPL
-0.59
Murd
-0.59
POSITIVE LOGITS
manship
1.10
legislatures
0.94
rooms
0.92
men
0.87
chool
0.83
ide
0.82
legalizing
0.81
wide
0.79
states
0.79
hips
0.78
Activations Density 0.035%