INDEX
Explanations
phrases indicating preferences or choices
New Auto-Interp
Negative Logits
russes
-0.73
poffe
-0.70
Reſ
-0.70
itſelf
-0.70
ſelf
-0.69
houſe
-0.68
himſelf
-0.66
käse
-0.66
Conſ
-0.65
alve
-0.64
POSITIVE LOGITS
about
1.75
ABOUT
1.71
ABOUT
1.63
About
1.56
abt
1.48
About
1.47
bout
1.40
about
1.40
Bout
1.35
Bout
1.26
Activations Density 0.114%