INDEX
Explanations
This neuron is primarily looking for names of people and places in various contexts
phrases indicating significant events or notable actions
New Auto-Interp
Negative Logits
itably
-0.79
ancial
-0.75
escription
-0.74
;;;;;;;;;;;;
-0.71
irlf
-0.71
uilt
-0.70
ividual
-0.69
ividually
-0.66
retty
-0.66
lished
-0.66
POSITIVE LOGITS
CLASSIFIED
1.09
NetMessage
0.95
WAYS
0.89
behavi
0.87
natureconservancy
0.81
GoldMagikarp
0.81
largeDownload
0.81
welf
0.76
toile
0.76
Interstitial
0.76
Activations Density 6.087%