INDEX
Explanations
strong emotional reactions and evaluations in text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.11
0.3%
1473
+0.09
0.3%
666
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1473
+0.11
0.03
1154
+0.09
0.04
1965
+0.08
0.04
Negative Logits
كومونز
-0.66
noten
-0.59
farbe
-0.59
poliuret
-0.58
новниш
-0.56
ortop
-0.56
клопе
-0.55
جغرافيا
-0.55
InkWell
-0.54
ThemeData
-0.54
POSITIVE LOGITS
suscep
1.50
maneu
1.45
unwarran
1.37
disreg
1.36
disagre
1.36
depic
1.36
increa
1.33
indestru
1.31
unden
1.30
reluct
1.29
Activations Density 0.375%