INDEX
Explanations
descriptive terms related to actions or behaviors of individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.13
0.4%
990
+0.11
0.3%
297
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
990
+0.13
0.06
332
+0.11
0.06
284
+0.10
0.07
Negative Logits
interessa
-1.06
Rgds
-0.98
parteci
-0.97
dovr
-0.97
lancia
-0.95
dovre
-0.95
trovo
-0.93
sappi
-0.92
aspetta
-0.91
vuol
-0.90
POSITIVE LOGITS
EINVAL
0.62
.
0.59
are
0.55
ünste
0.52
vannak
0.52
and
0.52
HttpHeaders
0.51
,
0.50
-
0.49
ections
0.49
Activations Density 0.393%