INDEX
Explanations
sentence
The neuron activates on occurrences of the word “sentence” (and its inflected forms) in the text.
New Auto-Interp
Negative Logits
_get
-0.07
Mag
-0.07
stud
-0.06
Coalition
-0.06
14
-0.06
air
-0.06
coalition
-0.06
anarchist
-0.06
.Get
-0.06
expo
-0.06
POSITIVE LOGITS
sentencing
0.08
센
0.08
sentence
0.08
Sentence
0.07
sentenced
0.07
_serializer
0.07
)set
0.07
tint
0.07
ених
0.07
etur
0.07
Activations Density 0.005%