INDEX
Explanations
instances of physical actions or interactions between individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1177
+0.15
0.5%
690
+0.10
0.3%
1533
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
509
+0.15
0.07
569
+0.10
0.07
690
+0.10
0.04
Negative Logits
kosme
-0.67
ideolog
-0.66
kriminal
-0.66
AfterClass
-0.66
parlamentar
-0.64
praktik
-0.63
arit
-0.61
mision
-0.61
subjek
-0.61
klinik
-0.60
POSITIVE LOGITS
disreg
1.21
malheureux
1.18
jurassic
1.09
milf
1.09
pixar
1.08
malheur
1.08
hentai
1.07
shenan
1.03
POETRY
1.02
simpsons
1.00
Activations Density 0.481%