INDEX
Explanations
These activations suggest that the neuron is looking for names of specific places, people, and entities
proper nouns or names associated with people and places
New Auto-Interp
Negative Logits
defect
-0.70
ufact
-0.69
ucl
-0.67
shire
-0.64
etheus
-0.57
imperative
-0.56
narrated
-0.56
ensical
-0.55
arcity
-0.54
cour
-0.54
POSITIVE LOGITS
hiba
0.80
wagen
0.79
agi
0.74
Mods
0.73
scl
0.70
hei
0.69
kat
0.68
hesda
0.64
apons
0.63
Container
0.62
Activations Density 1.157%