INDEX
Explanations
the word "again" at different levels of activation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1265
+0.13
0.5%
47
+0.11
0.4%
131
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
47
+0.13
0.04
1472
+0.11
0.03
131
+0.10
0.02
Negative Logits
تفصیلات
-0.60
kog
-0.58
kard
-0.56
uhr
-0.53
meras
-0.52
buone
-0.51
Mü
-0.51
Transcrip
-0.51
sembla
-0.51
vira
-0.50
POSITIVE LOGITS
schoolmaster
0.89
wanderer
0.82
parson
0.78
redhead
0.76
gladiator
0.75
indestru
0.74
pamph
0.74
countryman
0.73
steamboat
0.73
rascal
0.73
Activations Density 0.072%