INDEX
Explanations
pronouns referring to possession or affiliation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1385
+0.14
0.4%
1741
+0.12
0.4%
1937
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
131
+0.14
0.06
1937
+0.12
0.07
332
+0.11
0.06
Negative Logits
gaily
-1.11
inconce
-1.10
excru
-1.10
disagre
-1.09
impra
-1.06
apprehen
-1.04
indescri
-1.04
fortn
-1.04
reluct
-1.00
depic
-0.99
POSITIVE LOGITS
reputa
0.70
own
0.69
entire
0.64
eyes
0.58
OGND
0.58
ὁ
0.58
مرئيه
0.56
parents
0.55
Darum
0.54
astéro
0.54
Activations Density 0.322%