INDEX
Explanations
descriptive words related to a person's character
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1446
+0.11
0.3%
605
+0.10
0.3%
940
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
605
+0.11
0.02
1446
+0.10
0.04
1528
+0.07
0.04
Negative Logits
Intere
-1.98
inev
-1.95
mef
-1.92
dises
-1.91
makro
-1.89
seiz
-1.88
wien
-1.87
Keny
-1.84
squa
-1.83
oner
-1.83
POSITIVE LOGITS
compassionate
0.70
always
0.69
caring
0.68
personality
0.68
strictEqual
0.67
kindness
0.66
loves
0.66
compassion
0.66
willing
0.65
love
0.65
Activations Density 0.547%