INDEX
Explanations
adjectives conveying negative evaluations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1265
+0.14
0.5%
468
+0.13
0.4%
188
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
468
+0.14
0.06
188
+0.13
0.05
1363
+0.12
0.05
Negative Logits
solicited
-0.49
interessanti
-0.47
scoper
-0.47
Perché
-0.46
-------
-0.45
Preço
-0.44
authorised
-0.44
warran
-0.43
buone
-0.43
riguard
-0.42
POSITIVE LOGITS
marseille
0.84
cannes
0.80
carrefour
0.80
marte
0.80
wretch
0.80
popoli
0.80
cristi
0.77
fuo
0.76
cabrio
0.76
bourg
0.75
Activations Density 0.246%