INDEX
Explanations
phrases related to medical warnings and side effects of medication
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.29
1.1%
453
+0.08
0.3%
1343
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1380
+0.29
0.01
117
+0.08
0.03
852
+0.07
0.04
Negative Logits
<bos>
-2.18
جغرافيا
-0.66
held
-0.65
hold
-0.65
public
-0.65
力
-0.65
put
-0.64
Савез
-0.63
run
-0.62
</tbody>
-0.61
POSITIVE LOGITS
affor
1.94
maneu
1.83
accla
1.72
volunte
1.71
increa
1.70
Confe
1.69
disagre
1.68
fortn
1.68
stockholm
1.67
milf
1.65
Activations Density 0.462%