INDEX
Explanations
The neuron seems to find words or phrases related to reasons or explanations
the presence of the verb "is" in various contexts
New Auto-Interp
Negative Logits
Highlights
-0.73
icter
-0.73
tainment
-0.71
itas
-0.69
vana
-0.64
vantage
-0.64
lez
-0.64
umbs
-0.64
congratulations
-0.63
abouts
-0.63
POSITIVE LOGITS
unregulated
0.86
such
0.85
technically
0.85
rarely
0.85
ostensibly
0.85
inherently
0.84
geographically
0.84
so
0.83
notoriously
0.83
already
0.83
Activations Density 0.316%