INDEX
Explanations
The neuron is looking for words related to layers or boundaries of objects
references to the "outer" layers or boundaries of various subjects
New Auto-Interp
Negative Logits
essors
-0.78
inators
-0.74
inatory
-0.74
inator
-0.74
netflix
-0.73
Pitt
-0.72
bley
-0.71
anwhile
-0.71
HCR
-0.71
utes
-0.70
POSITIVE LOGITS
most
1.31
wear
1.01
casing
0.85
circumference
0.85
worldly
0.84
borough
0.84
diameter
0.83
garments
0.79
bounds
0.79
extrem
0.77
Activations Density 0.012%