INDEX
Explanations
mentions of regulations or restrictions related to speech or expression
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.12
0.4%
130
+0.12
0.4%
1622
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1622
+0.12
0.04
1262
+0.12
0.05
130
+0.11
0.05
Negative Logits
tramonto
-0.86
papà
-0.76
medesimo
-0.73
mattino
-0.72
ritratto
-0.72
paradiso
-0.71
tempio
-0.70
lusso
-0.69
signore
-0.69
compleanno
-0.68
POSITIVE LOGITS
No
0.93
No
0.89
NO
0.85
no
0.85
no
0.83
NO
0.82
Nos
0.70
Nos
0.65
nos
0.63
№
0.56
Activations Density 0.107%