INDEX
Explanations
personal pronouns and phrases indicating personal opinions or thoughts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.17
0.5%
805
+0.14
0.4%
1919
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.17
0.11
805
+0.14
0.11
331
+0.13
0.08
Negative Logits
MÁ
-0.97
fosfor
-0.97
soggior
-0.93
monaster
-0.93
susun
-0.92
hunde
-0.89
masaj
-0.89
affez
-0.89
guma
-0.89
Singapur
-0.89
POSITIVE LOGITS
disagre
0.95
apprehen
0.92
shenan
0.91
I
0.83
I
0.82
unspeak
0.82
intersper
0.81
vainly
0.79
encomp
0.79
impelled
0.79
Activations Density 0.257%