INDEX
Explanations
phrases indicating a contrast or contradiction
the word "how" in various contexts
New Auto-Interp
Negative Logits
uthor
-0.65
agonists
-0.63
Grail
-0.62
lehem
-0.61
Guer
-0.58
iculture
-0.57
ception
-0.57
agonist
-0.56
isher
-0.56
Yard
-0.56
POSITIVE LOGITS
soever
1.08
HCR
0.86
beit
0.86
ever
0.82
ls
0.82
ling
0.81
ells
0.77
itzer
0.76
much
0.75
MUCH
0.74
Activations Density 0.084%