INDEX
Explanations
expressions indicating shock or disbelief
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.09
0.2%
199
+0.08
0.2%
674
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1795
+0.09
0.02
1450
+0.08
0.01
88
+0.07
0.02
Negative Logits
Shreve
-0.47
Taub
-0.47
Daven
-0.45
Karp
-0.45
Reiche
-0.45
Kinn
-0.45
Bayard
-0.45
Vogt
-0.45
Respectfully
-0.45
Byp
-0.45
POSITIVE LOGITS
soggior
0.92
pymysql
0.78
appunt
0.75
venons
0.71
costumi
0.69
pantaloni
0.69
ristor
0.68
abiti
0.68
caratteri
0.66
sogni
0.65
Activations Density 0.121%