INDEX
Explanations
mentions of additional information or content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
812
+0.12
0.4%
1325
+0.10
0.3%
776
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
812
+0.12
0.03
1793
+0.10
0.03
484
+0.10
0.02
Negative Logits
sappi
-0.79
incess
-0.76
ausp
-0.74
decret
-0.73
Áng
-0.73
errone
-0.70
persua
-0.70
ideolog
-0.70
trás
-0.69
consoli
-0.69
POSITIVE LOGITS
addition
0.78
additions
0.64
addition
0.62
Addition
0.62
Addition
0.55
afectado
0.55
față
0.52
afectada
0.51
aislada
0.51
extraña
0.51
Activations Density 0.044%