INDEX
Explanations
phrases related to historical discoveries, economic activities, and public governance
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1276
+0.13
0.4%
555
+0.12
0.4%
1677
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.13
0.09
1677
+0.12
0.08
555
+0.12
0.06
Negative Logits
solidar
-0.74
anse
-0.69
ché
-0.68
antik
-0.67
minimalis
-0.66
dì
-0.64
dè
-0.62
robus
-0.62
bont
-0.62
utop
-0.62
POSITIVE LOGITS
shenan
0.75
nevertheless
0.70
unspeak
0.65
disreg
0.62
nonetheless
0.61
motherfucker
0.60
unlaw
0.60
politiet
0.60
miscon
0.60
fortunately
0.59
Activations Density 0.236%