INDEX
Explanations
emotional emphasis within the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
396
+0.13
0.7%
391
+0.11
0.6%
381
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
18
+0.13
0.03
391
+0.11
0.02
446
+0.11
0.02
Negative Logits
rapped
-1.53
gebra
-1.51
ways
-1.46
homosexual
-1.41
googleapis
-1.41
fork
-1.40
atural
-1.36
ctions
-1.35
wegian
-1.34
fts
-1.33
POSITIVE LOGITS
ulation
1.70
ulator
1.68
bourg
1.67
bers
1.66
©
1.63
ulators
1.63
ulsions
1.60
ħ
1.59
ersion
1.59
erton
1.57
Activations Density 0.132%