INDEX
Explanations
statements urging or promoting positive actions or behaviors
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1392
+0.12
0.4%
1379
+0.11
0.4%
347
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1379
+0.12
0.03
1141
+0.11
0.03
1392
+0.11
0.03
Negative Logits
Inoltre
-0.53
offrir
-0.52
représentants
-0.52
Ciò
-0.50
Caratteristiche
-0.50
effetto
-0.49
accordo
-0.49
wieś
-0.48
asString
-0.48
municipi
-0.48
POSITIVE LOGITS
encouragement
1.00
encourage
0.96
encouraging
0.94
encouraged
0.94
Encourage
0.86
encourages
0.85
territo
0.83
couraging
0.82
couraged
0.80
discouraged
0.77
Activations Density 0.087%