INDEX
Explanations
phrases related to persuading or convincing someone to take a specific action
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.15
0.4%
1403
+0.10
0.3%
334
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
813
+0.15
0.04
1166
+0.10
0.05
1553
+0.09
0.05
Negative Logits
Messieurs
-1.00
poichè
-0.91
exé
-0.91
quoique
-0.87
rafra
-0.85
Sén
-0.85
goû
-0.82
véhic
-0.82
nuage
-0.80
Áng
-0.80
POSITIVE LOGITS
<bos>
1.02
click
0.60
product
0.59
تضيفلها
0.57
products
0.57
clicking
0.56
AsUp
0.56
stik
0.55
geforce
0.55
user
0.55
Activations Density 0.460%