INDEX
Explanations
phrases related to conflicts, confrontations, and political opinions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
397
+0.15
0.5%
32
+0.12
0.4%
1482
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
397
+0.15
0.04
32
+0.12
0.04
395
+0.12
0.04
Negative Logits
oltán
-0.50
lustre
-0.47
unehmen
-0.46
genicity
-0.45
علق
-0.45
liness
-0.44
IndexError
-0.43
IOError
-0.43
carelessly
-0.43
tragung
-0.42
POSITIVE LOGITS
inder
0.94
thermomix
0.94
dovr
0.92
sappi
0.91
sopr
0.90
migli
0.88
solidar
0.86
dichi
0.83
dises
0.83
erec
0.82
Activations Density 0.120%