INDEX
Explanations
references to specific requirements and conditions that need to be met for potential scenarios
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
297
+0.11
0.3%
198
+0.10
0.3%
1009
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1009
+0.11
0.04
1592
+0.10
0.04
297
+0.10
0.06
Negative Logits
emphat
-0.90
suspic
-0.90
vorrei
-0.90
disgra
-0.88
eiffel
-0.86
hentai
-0.85
accla
-0.85
rispond
-0.85
dises
-0.83
intersper
-0.82
POSITIVE LOGITS
חיצוניים
0.66
are
0.60
AssemblyCulture
0.56
were
0.56
AutoresizingMask
0.56
useParams
0.55
DisplayMetrics
0.55
diali
0.55
micas
0.53
those
0.52
Activations Density 0.556%