INDEX
Explanations
questions or statements indicating doubt or uncertainty
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
919
+0.09
0.3%
468
+0.08
0.2%
347
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
504
+0.09
0.04
919
+0.08
0.02
1982
+0.08
0.03
Negative Logits
pernic
-0.84
lele
-0.82
ché
-0.77
uhr
-0.76
anton
-0.76
ardu
-0.76
albic
-0.74
dises
-0.73
ria
-0.72
Kategor
-0.71
POSITIVE LOGITS
?
0.70
؟
0.66
?’
0.65
?)
0.64
?'
0.64
()?
0.63
?:
0.62
?”
0.61
?"
0.61
?}
0.61
Activations Density 0.248%