INDEX
Explanations
verbs related to expressing opinions or observations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1758
+0.10
0.3%
938
+0.09
0.3%
814
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1101
+0.10
0.03
1001
+0.09
0.04
814
+0.09
0.02
Negative Logits
nece
-0.98
fte
-0.92
effe
-0.90
mef
-0.89
lein
-0.84
acce
-0.83
perfon
-0.83
fto
-0.83
fta
-0.82
ille
-0.81
POSITIVE LOGITS
GEBURTSDATUM
0.73
guangdong
0.59
say
0.57
say
0.56
entious
0.55
ertion
0.55
OGND
0.55
autorytatywna
0.55
idated
0.52
mußte
0.52
Activations Density 0.187%