INDEX
Explanations
phrases related to official documents and statements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
0.8%
1810
+0.10
0.4%
971
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
313
+0.19
0.04
1810
+0.10
0.04
161
+0.09
0.04
Negative Logits
<bos>
-2.70
<?
-0.88
ⓧ
-0.84
/**
-0.79
&___
-0.78
-0.72
jsPsych
-0.68
bezeichneter
-0.67
EndGlobalSection
-0.64
Jeografia
-0.64
POSITIVE LOGITS
milf
1.09
wikihow
0.96
affor
0.92
ricardo
0.91
disreg
0.90
🤣🤣
0.90
lmfao
0.90
excru
0.89
scrat
0.88
verona
0.88
Activations Density 0.235%