INDEX
Explanations
words related to personal experiences and explanations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1108
+0.11
0.3%
674
+0.11
0.3%
175
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
175
+0.11
0.07
586
+0.11
0.05
847
+0.09
0.03
Negative Logits
reluct
-1.74
indestru
-1.71
fta
-1.69
?...
-1.69
emphat
-1.69
strick
-1.69
snoopy
-1.68
increa
-1.67
secon
-1.66
disagre
-1.66
POSITIVE LOGITS
<bos>
1.09
nonetheless
1.04
nevertheless
0.82
ändå
0.75
anyway
0.70
enough
0.63
comunque
0.62
SystemColors
0.60
.
0.59
certainly
0.59
Activations Density 0.760%