INDEX
Explanations
instances of the word "yet" and its variations, indicating a focus on contrasting or juxtaposing ideas
New Auto-Interp
Negative Logits
aggio
-0.16
opers
-0.15
ught
-0.14
ddit
-0.14
oley
-0.14
iginal
-0.14
ELSE
-0.14
erd
-0.14
atern
-0.13
ledo
-0.13
POSITIVE LOGITS
somehow
0.27
ting
0.24
ters
0.21
tings
0.20
forth
0.20
ter
0.20
Somehow
0.19
ta
0.19
-to
0.17
despite
0.17
Activations Density 0.022%