INDEX
Explanations
It's difficult to categorize precisely what Neuron 4 is looking for based on the activations provided, as the concepts associated with "somehow" seem varied and context-dependent
instances of the word "somehow."
New Auto-Interp
Negative Logits
io
-0.65
sych
-0.64
anon
-0.63
oft
-0.63
COUR
-0.62
emies
-0.61
ios
-0.60
starters
-0.59
Prospect
-0.59
Techniques
-0.58
POSITIVE LOGITS
somew
1.15
magically
0.88
somehow
0.87
managed
0.84
Else
0.82
mirac
0.80
Detected
0.73
unaccount
0.73
oping
0.71
manage
0.71
Activations Density 0.017%