INDEX
Explanations
It seems that neuron 4 is looking for names or terms related to specific places or persons
single character tokens or names with specific letters
New Auto-Interp
Negative Logits
DERR
-0.76
APS
-0.68
BILITIES
-0.62
growth
-0.61
OTAL
-0.59
İĭ
-0.58
CLOSE
-0.57
lest
-0.57
somet
-0.57
rall
-0.56
POSITIVE LOGITS
sted
0.66
Hospital
0.65
hybrids
0.62
heit
0.62
heid
0.62
combo
0.61
joint
0.61
?????-
0.60
imore
0.60
enna
0.59
Activations Density 0.284%