INDEX
Explanations
mentions of specific academic and political figures, as well as topics related to medical studies and research
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.17
0.6%
198
+0.16
0.5%
50
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.17
0.05
68
+0.16
0.04
506
+0.15
0.03
Negative Logits
mistak
-0.76
relenting
-0.73
alnız
-0.69
leçons
-0.68
thinkable
-0.65
ⓧ
-0.62
procédures
-0.61
wavering
-0.61
prochaines
-0.61
peines
-0.60
POSITIVE LOGITS
fta
1.20
hcm
1.19
fte
1.17
ftu
1.17
aen
1.17
gend
1.13
vian
1.13
lele
1.12
meis
1.11
fordable
1.10
Activations Density 0.142%