INDEX
Explanations
verbs and adjectives related to personal behavior
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1921
+0.13
0.4%
75
+0.13
0.4%
1271
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1921
+0.13
0.04
75
+0.13
0.03
890
+0.09
0.03
Negative Logits
logis
-0.65
starte
-0.62
buk
-0.61
tages
-0.60
puc
-0.60
opio
-0.59
Lider
-0.59
stopp
-0.58
mola
-0.58
dises
-0.58
POSITIVE LOGITS
Tend
1.09
tend
1.03
Tend
0.98
tended
0.92
TEND
0.91
tends
0.86
tending
0.86
tend
0.81
tendance
0.76
tendency
0.74
Activations Density 0.052%