INDEX
Explanations
ironically, the neuron appears to activate most strongly for the word "these" across a variety of contexts
occurrences of the word "these" in various contexts
New Auto-Interp
Negative Logits
llor
-0.72
zzle
-0.70
ppe
-0.69
pless
-0.69
achus
-0.67
/
-0.66
renheit
-0.66
Tank
-0.65
lest
-0.64
nah
-0.62
POSITIVE LOGITS
fellows
0.90
newfound
0.84
kinds
0.77
findings
0.77
sorts
0.73
guys
0.73
proceedings
0.73
particular
0.72
sights
0.72
respective
0.70
Activations Density 0.060%