INDEX
Explanations
phrases related to individuals and events on social media
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.12
0.4%
1577
+0.10
0.3%
906
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1439
+0.12
0.05
1032
+0.10
0.05
1765
+0.10
0.05
Negative Logits
Vegeu
-0.61
árbol
-0.60
considération
-0.59
droje
-0.58
iseite
-0.57
pymysql
-0.56
churrasco
-0.56
smör
-0.56
asteroide
-0.56
Capacidad
-0.55
POSITIVE LOGITS
profi
0.70
twit
0.67
ftw
0.65
strick
0.64
XNUMX
0.63
diverti
0.61
espres
0.61
emble
0.60
upvoted
0.60
laun
0.59
Activations Density 0.225%