INDEX
Explanations
titles of sections within an article or study
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1699
+0.14
0.4%
906
+0.13
0.4%
468
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
924
+0.14
0.04
1856
+0.13
0.05
790
+0.11
0.03
Negative Logits
Assista
-0.92
Obrigada
-0.87
Abraços
-0.85
Abraço
-0.83
اقرأ
-0.81
Todavía
-0.80
Conheça
-0.80
Ambos
-0.79
Opinión
-0.79
Curiosidades
-0.78
POSITIVE LOGITS
increa
1.62
encomp
1.61
maneu
1.56
depic
1.55
reluct
1.51
guarante
1.51
suscep
1.51
impra
1.49
attemp
1.46
wherea
1.46
Activations Density 0.270%