INDEX
Explanations
facts or statements presented as supporting evidence for an argument or claim
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1950
+0.13
0.4%
1482
+0.13
0.4%
872
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1950
+0.13
0.04
1482
+0.13
0.04
1306
+0.10
0.03
Negative Logits
nomine
-0.60
<=",
-0.59
solidar
-0.59
aspira
-0.54
Geografia
-0.53
IntoConstraints
-0.53
veter
-0.53
SourceChecksum
-0.53
Hva
-0.52
noten
-0.52
POSITIVE LOGITS
disreg
0.97
unspeak
0.91
apprehen
0.87
cushi
0.85
hairc
0.81
gaily
0.81
cuck
0.80
unwarran
0.80
ineffec
0.79
pooh
0.77
Activations Density 0.079%