INDEX
Explanations
phrases related to diminishing or reducing something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1265
+0.13
0.5%
662
+0.11
0.4%
1334
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1265
+0.13
0.04
1575
+0.11
0.04
662
+0.11
0.03
Negative Logits
Sén
-0.83
Docteur
-0.78
unspeak
-0.76
Jusqu
-0.74
Autre
-0.72
Cinéma
-0.71
Rois
-0.71
Président
-0.71
Secrétaire
-0.69
Chapitre
-0.69
POSITIVE LOGITS
less
0.93
LESS
0.86
Less
0.82
Less
0.81
less
0.75
menos
0.71
weniger
0.71
LESS
0.70
than
0.66
meno
0.64
Activations Density 0.103%