INDEX
Explanations
The neuron seems to be looking for references to specific entities or attributes indicated by the word "that" with an emphasis on explanations or relationships
the word "that" in various contexts
New Auto-Interp
Negative Logits
brates
-0.76
ormons
-0.76
rior
-0.70
cycles
-0.69
uously
-0.69
oby
-0.69
istics
-0.69
ciples
-0.68
asters
-0.68
Leilan
-0.66
POSITIVE LOGITS
pesky
1.12
fateful
1.08
particular
1.04
same
0.97
kind
0.97
sort
0.85
equation
0.84
aforementioned
0.84
elusive
0.83
type
0.81
Activations Density 0.102%