INDEX
Explanations
phrases related to books or literature, and also words related to vehicles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.10
0.3%
86
+0.09
0.3%
849
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1368
+0.10
0.03
1559
+0.09
0.03
1675
+0.08
0.05
Negative Logits
accla
-1.46
emphat
-1.41
madonna
-1.41
embra
-1.41
secon
-1.40
wien
-1.39
casio
-1.38
increa
-1.37
perfet
-1.37
vhs
-1.36
POSITIVE LOGITS
tiny
0.90
small
0.87
Small
0.82
small
0.82
Small
0.82
tiny
0.79
kecil
0.76
和小
0.74
小的
0.73
little
0.72
Activations Density 0.461%