INDEX
Explanations
examples provided are not sufficient to determine a specific pattern or preference for this neuron
the letter "g" in various contexts
New Auto-Interp
Negative Logits
sclerosis
-0.71
derail
-0.69
regist
-0.66
moderator
-0.65
Wonderful
-0.63
Rated
-0.63
intern
-0.62
booster
-0.61
kickoff
-0.59
FINE
-0.59
POSITIVE LOGITS
asp
1.00
raphics
0.95
ardless
0.92
uild
0.84
ascript
0.83
bags
0.81
ars
0.79
athering
0.79
ods
0.78
oths
0.77
Activations Density 0.010%