INDEX
Explanations
phrases related to political and social commentary, including terms like "cultural critique," "religious and racial categories," and "immigration sentiments."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
604
+0.13
0.4%
381
+0.11
0.3%
513
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
82
+0.13
0.06
1548
+0.11
0.04
25
+0.10
0.05
Negative Logits
bordeaux
-1.46
Juf
-1.43
fluo
-1.41
franz
-1.35
casio
-1.34
lyon
-1.32
dises
-1.30
Châ
-1.30
canel
-1.29
levis
-1.29
POSITIVE LOGITS
doesn
0.85
happens
0.80
seems
0.80
helps
0.79
It
0.79
enables
0.79
is
0.79
allows
0.79
does
0.78
represents
0.78
Activations Density 0.232%