INDEX
Explanations
original
This neuron specifically detects occurrences of the word “original” (including its subword pieces) in the text.
New Auto-Interp
Negative Logits
-axis
-0.06
neuen
-0.06
Approx
-0.06
]>=
-0.06
skept
-0.06
awarded
-0.06
δόν
-0.06
た
-0.06
ATEST
-0.06
pd
-0.06
POSITIVE LOGITS
Chi
0.08
Illuminate
0.07
mattresses
0.07
CLE
0.06
.constant
0.06
MAR
0.06
.CharField
0.06
')</
0.06
.Agent
0.06
chair
0.06
Activations Density 0.016%