INDEX
Explanations
phrases related to encouragement or emphasis on a particular aspect
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.11
0.4%
15
+0.06
0.2%
1598
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1973
+0.11
0.03
1902
+0.06
0.02
599
+0.06
0.02
Negative Logits
<bos>
-1.45
<!--
-0.88
/***
-0.86
///**
-0.80
/*!
-0.79
/**
-0.77
<?
-0.72
-0.71
/*
-0.68
glColor
-0.66
POSITIVE LOGITS
ecru
1.71
maneu
1.48
impra
1.45
bordeaux
1.44
!...
1.44
increa
1.42
accla
1.42
swarovski
1.41
?...
1.41
affor
1.40
Activations Density 0.115%