INDEX
Explanations
words related to research findings and discussions about various topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.11
0.3%
1870
+0.10
0.3%
382
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
109
+0.11
0.06
468
+0.10
0.06
1867
+0.09
0.05
Negative Logits
specialmente
-0.99
perciò
-0.90
chiaramente
-0.89
purtroppo
-0.85
solidar
-0.85
persino
-0.83
ideolog
-0.83
anzi
-0.81
rispond
-0.81
succede
-0.78
POSITIVE LOGITS
.
0.71
;
0.66
,
0.58
maxSize
0.57
。
0.54
rval
0.54
and
0.53
<bos>
0.52
totalCount
0.52
ly
0.52
Activations Density 0.425%