INDEX
Explanations
The main thing this neuron does is find words related to options or choices
phrases that express alternatives or conditionality
New Auto-Interp
Negative Logits
enger
-0.63
avan
-0.61
Peak
-0.58
imen
-0.57
oro
-0.57
oya
-0.56
auri
-0.55
eva
-0.55
risis
-0.54
erb
-0.54
POSITIVE LOGITS
preferably
0.81
evidenced
0.65
etheless
0.65
congr
0.63
aukee
0.61
artifacts
0.60
especially
0.59
cause
0.56
maybe
0.56
EVEN
0.56
Activations Density 0.548%