INDEX
Explanations
facts or statements emphasizing the truthfulness or accuracy of information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.25
0.9%
36
+0.07
0.2%
2034
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1784
+0.25
0.07
1473
+0.07
0.06
605
+0.07
0.03
Negative Logits
<bos>
-2.33
ⓧ
-0.89
<?
-0.84
<?
-0.80
/***
-0.78
///**
-0.77
/*
-0.72
-0.67
endwhile
-0.66
//};
-0.61
POSITIVE LOGITS
véhic
1.09
soulign
1.02
récomp
0.95
maroc
0.94
originaire
0.94
quoique
0.93
fortn
0.92
Juf
0.92
chrétien
0.91
gettyimages
0.91
Activations Density 0.990%