INDEX
Explanations
phrases related to personal reflections and decision-making
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.14
0.4%
381
+0.13
0.4%
1919
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.14
0.06
1415
+0.13
0.04
367
+0.09
0.04
Negative Logits
poliester
-0.95
moza
-0.82
kupa
-0.76
karton
-0.73
poliuret
-0.69
materie
-0.68
bronz
-0.68
Njema
-0.67
magazin
-0.66
muze
-0.63
POSITIVE LOGITS
indestru
0.75
disagre
0.70
inconce
0.65
downvotes
0.64
apprehen
0.62
shenan
0.62
cushi
0.62
hairc
0.61
disreg
0.61
encomp
0.61
Activations Density 0.156%