INDEX
Explanations
though to a lesser extent, the neuron also identifies government-related terms
terms related to political content and discussions
New Auto-Interp
Negative Logits
scl
-0.88
avorite
-0.72
duino
-0.72
gypt
-0.68
manship
-0.68
practicable
-0.68
nir
-0.66
hire
-0.66
HOME
-0.65
ylum
-0.65
POSITIVE LOGITS
ized
1.62
ization
1.57
ised
1.30
izing
1.30
izations
1.27
isation
1.23
izes
1.22
ize
1.13
ified
1.10
ation
1.05
Activations Density 0.036%