INDEX
Explanations
expressions of love and preferences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1035
+0.13
0.4%
869
+0.13
0.4%
874
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
869
+0.13
0.05
1035
+0.13
0.04
1565
+0.11
0.03
Negative Logits
verwijspagina
-0.64
Caratteristiche
-0.61
JvmStatic
-0.61
RuntimeObject
-0.60
Economía
-0.60
AfterEach
-0.59
Demografía
-0.59
Statistiche
-0.59
Trayectoria
-0.58
JsonKey
-0.58
POSITIVE LOGITS
maneu
1.59
impra
1.46
reluct
1.45
shenan
1.38
inappro
1.38
wherea
1.36
snoopy
1.36
depic
1.35
guarante
1.34
scrat
1.33
Activations Density 0.072%