INDEX
Explanations
references to philosophical or scientific terms related to rationalism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
528
+0.16
0.6%
1480
+0.14
0.5%
1705
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1705
+0.16
0.04
143
+0.14
0.04
1480
+0.12
0.04
Negative Logits
lara
-0.97
bordeaux
-0.93
aen
-0.88
michelin
-0.85
lyon
-0.85
bauer
-0.84
doman
-0.84
jacobs
-0.84
fep
-0.84
sii
-0.83
POSITIVE LOGITS
ist
0.92
ists
0.82
cist
0.69
tist
0.65
IST
0.64
logist
0.62
tists
0.59
ism
0.58
ist
0.57
istic
0.55
Activations Density 0.137%