INDEX
Explanations
references to experiences and comparisons in a specific context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
80
+0.10
0.3%
1129
+0.09
0.3%
845
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
991
+0.10
0.04
1379
+0.09
0.05
1129
+0.09
0.06
Negative Logits
<bos>
-0.72
TypedValue
-0.58
restera
-0.50
To
-0.49
appartient
-0.49
PrintStream
-0.48
to
-0.47
Search
-0.47
devra
-0.47
andolo
-0.47
POSITIVE LOGITS
hairc
1.21
thut
1.10
scrat
1.09
swarovski
1.06
hentai
1.06
milf
1.05
tew
1.01
fta
1.00
?...
0.99
greate
0.99
Activations Density 0.436%