INDEX
Explanations
mentions of scores or numbers presented in a specific format
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.14
0.4%
1581
+0.10
0.3%
11
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
378
+0.14
0.03
921
+0.10
0.03
923
+0.09
0.03
Negative Logits
]<=
-0.77
]**
-0.74
("="-0.72
])*
-0.70
))^{-0.69
(":");-0.67
)|^{-0.67
ecuted
-0.66
]>=
-0.66
Làm
-0.66
POSITIVE LOGITS
accla
1.32
intersper
1.29
encomp
1.20
vagu
1.18
maneu
1.12
depic
1.11
razer
1.11
milf
1.10
wattpad
1.09
contribut
1.09
Activations Density 0.057%