INDEX
Explanations
instances where the term "favor" is used in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
67
+0.14
0.6%
1339
+0.14
0.5%
1892
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1052
+0.14
0.03
420
+0.14
0.03
1892
+0.12
0.02
Negative Logits
Gré
-0.45
êncio
-0.43
Rha
-0.43
ste
-0.43
ittoria
-0.42
UIC
-0.42
Ros
-0.41
josé
-0.41
cyclo
-0.41
Crema
-0.40
POSITIVE LOGITS
favor
1.17
favors
1.11
Favor
1.10
favor
1.02
favoring
1.00
favour
0.98
Fav
0.96
favored
0.96
FAVOR
0.96
Favor
0.95
Activations Density 0.091%