INDEX
Explanations
quotation marks, indicating the presence of dialogue or quoted speech
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.16
0.9%
478
+0.15
0.9%
17
+0.15
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
100
+0.16
0.07
165
+0.15
0.05
505
+0.15
0.04
Negative Logits
ization
-1.66
emann
-1.65
approach
-1.61
instrument
-1.61
apparatus
-1.59
ensation
-1.57
iative
-1.56
Approach
-1.54
izing
-1.52
ocene
-1.51
POSITIVE LOGITS
yourselves
2.40
yourself
2.14
ourselves
1.84
myself
1.72
harmless
1.50
EEEE
1.38
Magistrate
1.37
safe
1.34
cin
1.33
presidential
1.33
Activations Density 0.279%