INDEX
Explanations
phrases indicating urgency or the need for prompt action
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
358
+0.13
0.7%
241
+0.12
0.7%
485
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
225
+0.13
0.04
327
+0.12
0.03
485
+0.12
0.02
Negative Logits
ments
-1.74
ibilities
-1.56
ivities
-1.55
ctors
-1.52
shoulder
-1.48
ities
-1.45
ment
-1.44
Ms
-1.39
pts
-1.39
mate
-1.37
POSITIVE LOGITS
á̏
1.76
pora
1.69
biamo
1.65
npmjs
1.61
blogger
1.52
endif
1.51
asone
1.42
elsen
1.42
sale
1.42
eller
1.35
Activations Density 0.226%