INDEX
Explanations
words related to the influence or impact of various factors on situations or subjects
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.21
1.2%
23
+0.18
1.0%
376
+0.15
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
148
+0.21
0.01
277
+0.18
0.02
200
+0.15
0.02
Negative Logits
clamation
-1.77
liche
-1.72
sburg
-1.69
illac
-1.64
illard
-1.61
ublic
-1.60
gio
-1.59
pherd
-1.56
CRIPTION
-1.54
neys
-1.52
POSITIVE LOGITS
Ļª
2.39
¥
2.36
ł
2.22
¶
2.11
©
2.07
Ł
2.07
Ĩ
2.04
ķ
2.03
ĺ
2.01
¬
2.00
Activations Density 0.098%