INDEX
Explanations
textual elements of dialogue or conversation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
161
+0.18
1.0%
204
+0.16
0.9%
233
+0.15
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
204
+0.18
0.16
161
+0.16
0.05
342
+0.15
0.07
Negative Logits
encing
-1.49
"(
-1.41
ences
-1.39
dig
-1.30
ând
-1.29
lives
-1.29
Domain
-1.27
"%
-1.26
ky
-1.25
vere
-1.24
POSITIVE LOGITS
ľĵ
3.70
©
3.62
ħ
3.61
ı
3.56
IJ
3.55
ĺ
3.53
¨
3.50
ŀ
3.43
Ł
3.42
ª
3.40
Activations Density 7.784%