INDEX
Explanations
instances of parentheses and their contents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
136
+0.14
0.8%
271
+0.14
0.8%
412
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
208
+0.14
0.06
136
+0.14
-0.00
250
+0.13
0.06
Negative Logits
-3.70
↵↵↵
-3.70
-3.70
-3.70
↵
-3.70
↵
-3.70
↵
-3.70
↵
-3.70
↵ Âł
-3.70
<|outofrange|>
-3.70
POSITIVE LOGITS
úblic
1.44
urban
1.44
aliana
1.42
reve
1.42
gmail
1.41
erald
1.38
oard
1.36
uten
1.35
acket
1.34
udes
1.34
Activations Density 0.263%