INDEX
Explanations
constant
The neuron fires on mentions of defining or using “constants.”
New Auto-Interp
Negative Logits
WANT
-0.06
score
-0.06
Aff
-0.06
많
-0.06
Patty
-0.06
конкур
-0.06
squat
-0.06
결혼
-0.06
Intent
-0.06
ATT
-0.06
POSITIVE LOGITS
Ads
0.08
gentlemen
0.07
constants
0.07
usted
0.07
.DE
0.06
YA
0.06
sten
0.06
earn
0.06
basın
0.06
ophysical
0.06
Activations Density 0.015%