INDEX
Explanations
short descriptions or summaries about online services or products
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.31
1.7%
1177
+0.12
0.7%
381
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1937
+0.31
0.14
68
+0.12
0.11
658
+0.10
0.11
Negative Logits
<bos>
-3.23
///**
-0.84
/***
-0.81
WriteAttribute
-0.74
//---
-0.71
RTDA
-0.69
RTLR
-0.68
ItemBackground
-0.68
PerformLayout
-0.67
JspWriter
-0.66
POSITIVE LOGITS
maneu
2.07
impra
1.94
increa
1.94
affor
1.86
accla
1.85
reluct
1.82
disagre
1.82
unspeak
1.77
shenan
1.75
indescri
1.71
Activations Density 1.650%