INDEX
Explanations
mentions of specific usernames in social media related to requests or provocations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1757
+0.16
0.6%
25
+0.14
0.6%
538
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
538
+0.16
0.05
981
+0.14
0.05
390
+0.14
0.04
Negative Logits
tramonto
-0.56
ethene
-0.55
medesimo
-0.53
paesaggio
-0.52
допомогти
-0.52
cammino
-0.51
ethane
-0.50
ritratto
-0.50
cristianismo
-0.47
tentativo
-0.47
POSITIVE LOGITS
lü
0.98
hek
0.96
minimalis
0.94
klo
0.93
konkre
0.92
makro
0.91
kompati
0.90
alkoh
0.89
panik
0.88
stoff
0.88
Activations Density 0.256%