INDEX
Explanations
phrases indicating contrast or opposition
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.13
0.6%
1404
+0.05
0.3%
605
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1570
+0.13
0.04
1305
+0.05
0.03
1785
+0.05
0.03
Negative Logits
<bos>
-1.56
-0.89
ⓧ
-0.79
<?
-0.78
public
-0.77
<?
-0.77
/**
-0.71
/*
-0.71
<>
-0.64
/*!
-0.64
POSITIVE LOGITS
maneu
2.10
affor
1.97
increa
1.86
impra
1.82
stockholm
1.78
wien
1.78
lidl
1.72
inev
1.71
aen
1.67
accla
1.65
Activations Density 0.109%