INDEX
Explanations
mentions of statistical data or trends
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.14
0.4%
752
+0.11
0.3%
1586
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1586
+0.14
0.05
113
+0.11
0.03
1173
+0.09
0.03
Negative Logits
<bos>
-0.83
relenting
-0.70
mistak
-0.63
chunky
-0.60
wavering
-0.59
classy
-0.56
kawaii
-0.55
peines
-0.55
spania
-0.55
snowy
-0.54
POSITIVE LOGITS
Bekasi
0.69
azule
0.67
granada
0.65
Trá
0.64
magis
0.64
hcm
0.63
Almería
0.62
aen
0.61
Praça
0.61
tamen
0.61
Activations Density 0.345%