INDEX
Explanations
mentions of the word "complicit" or variations of it
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
241
+0.15
0.6%
1306
+0.14
0.6%
1983
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.15
0.03
976
+0.14
0.03
1306
+0.14
0.03
Negative Logits
shenan
-0.82
intersper
-0.75
underval
-0.75
horrend
-0.73
miscon
-0.68
indestru
-0.68
disgra
-0.68
reluct
-0.64
fucker
-0.63
endow
-0.63
POSITIVE LOGITS
Comp
1.25
comp
1.20
COMP
1.15
comp
1.14
Comp
1.14
COMP
1.07
komp
0.88
komp
0.87
compaction
0.86
COM
0.83
Activations Density 0.096%