INDEX
Explanations
mentions of the names "Trump" and "Zuma"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.14
0.4%
1741
+0.11
0.3%
478
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.14
0.09
1001
+0.11
0.06
1957
+0.11
0.04
Negative Logits
initComponents
-0.72
Atsauces
-0.67
صوتيه
-0.66
pesta
-0.65
alkoh
-0.65
autorytatywna
-0.64
पया
-0.63
הערות
-0.63
melat
-0.63
İstinadlar
-0.63
POSITIVE LOGITS
:'(
0.82
Whence
0.82
lmfao
0.81
indestru
0.80
hahah
0.79
😭😭
0.78
himself
0.77
outlander
0.75
;-;
0.74
madonna
0.73
Activations Density 0.460%