INDEX
Explanations
texts related to assigning blame or responsibility
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.08
0.3%
1328
+0.06
0.2%
544
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
188
+0.08
0.02
995
+0.06
0.02
768
+0.06
0.02
Negative Logits
<bos>
-1.45
/***
-0.87
///**
-0.83
/**
-0.82
/*
-0.72
fileprivate
-0.65
<?
-0.61
InkWell
-0.60
usercontent
-0.59
butterknife
-0.58
POSITIVE LOGITS
fault
2.23
Fault
2.08
fault
1.93
Fault
1.82
faults
1.73
faults
1.67
Faults
1.25
FAULT
1.24
jaya
1.17
faulting
1.17
Activations Density 0.090%