INDEX
Explanations
the connecting word "then" in various contexts
New Auto-Interp
Negative Logits
Harlow
-0.78
Bär
-0.71
Irm
-0.71
Klagen
-0.69
Folsom
-0.69
himſelf
-0.68
лися
-0.68
Winfrey
-0.67
Kongo
-0.67
Ophelia
-0.66
POSITIVE LOGITS
THEN
1.30
THEN
1.25
then
1.15
Then
1.15
Then
1.04
+#+
1.02
then
1.01
dann
0.95
Dann
0.95
Dann
0.94
Activations Density 0.152%