INDEX
Explanations
instances where uncertainty or lack of clarity is expressed
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1265
+0.15
0.5%
1356
+0.11
0.3%
188
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
247
+0.15
0.04
1265
+0.11
0.04
188
+0.09
0.03
Negative Logits
melat
-0.66
Simult
-0.61
Román
-0.61
prodi
-0.60
Haci
-0.60
apparti
-0.60
Domínguez
-0.57
Joaqu
-0.57
Chá
-0.57
Luglio
-0.57
POSITIVE LOGITS
whether
0.77
unsure
0.75
uncertain
0.73
whether
0.66
uncertainty
0.64
Whether
0.61
uncertainties
0.59
Uncertainty
0.57
Uncertain
0.56
uncertainty
0.56
Activations Density 0.147%