INDEX
Explanations
These activations suggest that the neuron is looking for references to technical procedures and instructions
New Auto-Interp
Negative Logits
ratios
-0.52
tein
-0.50
Divide
-0.46
sinks
-0.45
awaru
-0.44
scores
-0.44
èĢħ
-0.43
Corridor
-0.43
heights
-0.43
pilgrims
-0.43
POSITIVE LOGITS
pport
0.75
seless
0.65
ccess
0.64
Torrent
0.54
tv
0.52
tymology
0.51
inent
0.51
xt
0.51
ft
0.50
lv
0.50
Activations Density 4.717%