INDEX
Explanations
instances where something is "spotted" or "sighted"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.09
0.3%
597
+0.07
0.2%
1919
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1439
+0.09
0.03
1919
+0.07
0.02
739
+0.06
0.02
Negative Logits
<bos>
-1.17
//}
-0.60
usercontent
-0.60
|}
-0.59
alignSelf
-0.58
cdk
-0.58
lateinit
-0.56
move
-0.56
displayquote
-0.55
Betracht
-0.55
POSITIVE LOGITS
increa
1.97
affor
1.92
maneu
1.87
emphat
1.82
thut
1.80
strick
1.79
disagre
1.79
Juf
1.75
impra
1.73
depic
1.71
Activations Density 0.097%