INDEX
Explanations
The main thing this neuron does is find words related to overriding or overruling something
terms related to the concept of "overriding" or "overridden," often in legal or technical contexts
New Auto-Interp
Negative Logits
=-=-=-=-=-=-=-=-
-0.78
istics
-0.72
Sham
-0.70
¯¯¯¯
-0.70
Drum
-0.68
Gamer
-0.67
Torrent
-0.66
dar
-0.65
pring
-0.65
fare
-0.64
POSITIVE LOGITS
overr
1.01
idden
0.99
uled
0.89
xual
0.87
override
0.83
uling
0.74
ides
0.73
overriding
0.71
ulation
0.70
ulator
0.67
Activations Density 0.023%