INDEX
Explanations
phrases expressing speculative or counterfactual situations
the phrase "would have" and its variations
New Auto-Interp
Negative Logits
muse
-0.63
rumor
-0.60
narrator
-0.59
plaintiff
-0.57
disinfect
-0.55
doll
-0.54
Cold
-0.54
trope
-0.54
reminder
-0.53
marqu
-0.53
POSITIVE LOGITS
been
1.03
gotten
0.93
been
0.93
¶
0.89
ĸļ
0.89
Ģ
0.80
taken
0.79
ãĥ´ãĤ¡
0.78
Been
0.78
gotten
0.76
Activations Density 0.076%