INDEX
Explanations
mentions of advertisements and advert content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.27
1.7%
221
+0.14
0.9%
376
+0.12
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
221
+0.27
0.02
109
+0.14
0.01
250
+0.12
0.01
Negative Logits
TRODUCTION
-1.62
else
-1.45
documentclass
-1.44
sudden
-1.41
quartile
-1.40
isex
-1.39
Argued
-1.38
jours
-1.35
rapeutics
-1.34
Ski
-1.30
POSITIVE LOGITS
icum
1.80
pora
1.68
iction
1.64
ios
1.60
ue
1.60
ong
1.60
opan
1.58
lei
1.58
ilate
1.57
uits
1.56
Activations Density 0.022%