INDEX
Explanations
phrases related to assigning responsibility or fault to someone or something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1961
+0.07
0.2%
896
+0.07
0.2%
650
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
847
+0.07
0.02
1795
+0.07
0.03
1003
+0.07
0.03
Negative Logits
<bos>
-1.27
public
-0.69
protected
-0.69
Vegeu
-0.68
export
-0.67
sec
-0.65
/**
-0.64
</thead>
-0.63
生
-0.62
/**
-0.62
POSITIVE LOGITS
blame
2.53
Blame
2.43
blame
2.18
blames
2.13
blaming
1.98
stockholm
1.95
blamed
1.95
Blame
1.91
accla
1.87
maneu
1.86
Activations Density 0.161%