INDEX
Explanations
mentions of specific locations and events, potentially related to news articles or incidents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1573
+0.10
0.3%
404
+0.10
0.3%
1565
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
776
+0.10
0.07
791
+0.10
0.05
1771
+0.10
0.06
Negative Logits
accla
-1.05
snoopy
-1.00
indescri
-0.97
impra
-0.95
sputnik
-0.94
vagu
-0.93
intrigu
-0.93
volunte
-0.93
shenan
-0.93
indestru
-0.92
POSITIVE LOGITS
(">>0.66
kristal
0.58
estekak
0.58
eseorang
0.58
гиоз
0.57
kalori
0.57
krim
0.56
farbe
0.55
otheby
0.54
malen
0.54
Activations Density 0.230%