INDEX
Explanations
words related to favor or preference
favoring or promoting
New Auto-Interp
Negative Logits
Schmitz
-0.48
Bernadette
-0.47
Schreiber
-0.45
Jules
-0.45
Christophe
-0.44
PC
-0.43
Kidd
-0.43
¡¡
-0.43
nhật
-0.43
Carla
-0.42
POSITIVE LOGITS
favor
0.97
favour
0.94
Favor
0.93
Fav
0.93
favor
0.89
Favor
0.88
favoring
0.88
Fav
0.86
favored
0.85
FAVOR
0.85
Activations Density 0.012%