INDEX
Explanations
phrases related to writing or inventing stories and documents, sometimes in a critical or speculative context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1147
+0.07
0.2%
967
+0.07
0.2%
151
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1958
+0.07
0.04
1513
+0.07
0.01
1273
+0.07
0.03
Negative Logits
increa
-1.09
affor
-1.07
purcha
-1.06
michelin
-1.02
coö
-1.02
unden
-1.02
guarante
-1.02
impra
-1.02
alre
-1.01
wherea
-1.01
POSITIVE LOGITS
resort
0.82
resorted
0.74
instead
0.70
rely
0.70
resorts
0.67
alternative
0.66
Instead
0.65
alternatives
0.64
resorting
0.63
Instead
0.63
Activations Density 0.429%