INDEX
Explanations
mentions of problematic car issues and customer service interactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.25
0.8%
184
+0.24
0.8%
674
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.25
0.04
1842
+0.24
0.05
1150
+0.17
0.02
Negative Logits
encomp
-1.15
inev
-1.14
fortn
-1.14
increa
-1.11
wherea
-1.10
reluct
-1.07
affor
-1.05
unden
-1.04
volunte
-1.03
michelin
-1.02
POSITIVE LOGITS
تضيفلها
0.59
prior
0.59
allegedly
0.50
aguchi
0.49
Walkover
0.48
>=",
0.48
supposed
0.47
earlier
0.46
ContentAlignment
0.46
originally
0.46
Activations Density 0.608%