INDEX
Explanations
sentences that end in "to" followed by high activations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
1.3%
1334
+0.10
0.5%
156
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1334
+0.26
0.05
395
+0.10
0.03
869
+0.10
0.04
Negative Logits
<bos>
-2.96
ⓧ
-0.80
/**
-0.77
<?
-0.76
-0.70
الد
-0.63
putnik
-0.59
/*
-0.57
//
-0.57
外
-0.56
POSITIVE LOGITS
lamborghini
1.54
scrat
1.53
affor
1.52
maneu
1.48
impra
1.47
accla
1.41
disreg
1.39
panama
1.38
isuzu
1.37
Minang
1.36
Activations Density 0.224%