INDEX
Explanations
words related to central themes or focuses in a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.08
0.2%
683
+0.07
0.2%
1622
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
384
+0.08
0.06
197
+0.07
0.05
1615
+0.07
0.05
Negative Logits
utop
-0.64
makro
-0.62
panik
-0.61
elek
-0.59
kram
-0.58
ideolog
-0.58
bayern
-0.58
solidar
-0.58
deko
-0.58
nark
-0.57
POSITIVE LOGITS
specifically
0.57
primarily
0.56
focuses
0.54
centered
0.54
mainly
0.53
Ufficio
0.53
focused
0.53
pymysql
0.52
kolei
0.52
torno
0.52
Activations Density 0.263%