INDEX
Explanations
occurrences of the word "look" followed by a number or a specific phrase structure indicating a comparison
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
0.9%
645
+0.08
0.3%
101
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1356
+0.24
0.06
1799
+0.08
0.05
177
+0.08
0.05
Negative Logits
<bos>
-2.64
<?
-1.15
ⓧ
-0.95
-0.94
/**
-0.92
<?
-0.90
/*
-0.86
/***
-0.68
lateinit
-0.65
pub
-0.59
POSITIVE LOGITS
maneu
1.65
affor
1.60
shenan
1.54
accla
1.52
impra
1.50
disreg
1.47
increa
1.42
unspeak
1.42
disagre
1.41
inev
1.38
Activations Density 0.603%