INDEX
Explanations
the word "try" and phrases containing advice or recommendations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
874
+0.13
0.4%
120
+0.12
0.4%
1778
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
120
+0.13
0.05
874
+0.12
0.05
1778
+0.12
0.04
Negative Logits
bander
-0.61
Singapur
-0.61
RectangleBorder
-0.59
ì
-0.54
Napole
-0.53
Reden
-0.53
Ruman
-0.52
Veter
-0.52
canes
-0.51
morm
-0.51
POSITIVE LOGITS
try
1.06
TRY
1.00
Try
0.94
tries
0.89
Try
0.89
tried
0.88
trying
0.85
tried
0.85
try
0.81
trying
0.80
Activations Density 0.086%