INDEX
Explanations
phrases that introduce a statement or argument
occurrences of the word "that" in various contexts
New Auto-Interp
Negative Logits
aukee
-0.71
en
-0.69
backer
-0.68
gallery
-0.68
ien
-0.65
raq
-0.64
Guard
-0.63
wn
-0.62
anie
-0.62
AMY
-0.61
POSITIVE LOGITS
they
0.75
contradicts
0.75
'[
0.73
justifies
0.70
witches
0.69
"#
0.69
"[
0.69
someday
0.67
we
0.65
somehow
0.64
Activations Density 0.269%