INDEX
Explanations
terms related to political and social ideologies, as well as specific names and topics within these fields
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.15
0.5%
2033
+0.10
0.3%
1150
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.15
0.09
1682
+0.10
0.05
862
+0.08
0.02
Negative Logits
tinte
-0.77
parati
-0.75
soggior
-0.74
onor
-0.70
ridu
-0.67
télécharge
-0.66
alun
-0.66
augus
-0.65
allarg
-0.65
lomb
-0.63
POSITIVE LOGITS
vainly
0.86
shewn
0.77
quitted
0.76
impelled
0.76
unspeak
0.70
apprehen
0.70
assailed
0.69
rejoiced
0.67
disagre
0.67
shuddered
0.67
Activations Density 0.978%