INDEX
Explanations
references to "side" and "sides," indicating a focus on aspects or perspectives of a situation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
256
+0.13
0.7%
287
+0.12
0.7%
226
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
42
+0.13
0.02
343
+0.12
0.03
287
+0.11
0.03
Negative Logits
itude
-2.01
live
-1.60
Spacewatch
-1.51
ifax
-1.48
«
-1.48
sleepy
-1.46
onde
-1.46
ulance
-1.43
itness
-1.42
itte
-1.42
POSITIVE LOGITS
kick
2.37
walks
2.06
wall
1.89
ographies
1.87
plates
1.81
plays
1.78
velt
1.78
plate
1.72
piece
1.70
walls
1.69
Activations Density 0.131%