INDEX
Explanations
promotional messages or calls to action typically related to news or articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.9%
971
+0.12
0.6%
699
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
971
+0.17
0.03
2036
+0.12
0.03
699
+0.10
0.03
Negative Logits
<bos>
-3.05
ⓧ
-1.10
-1.07
<?
-0.94
/***
-0.87
/**
-0.85
<?
-0.84
#![
-0.78
/*
-0.76
↘
-0.71
POSITIVE LOGITS
lele
1.73
wien
1.61
maroc
1.59
marseille
1.54
milano
1.53
bayern
1.52
napoli
1.52
riviera
1.50
bandung
1.49
dises
1.49
Activations Density 0.101%