INDEX
Explanations
instances of the word "that."
New Auto-Interp
Negative Logits
Guard
-0.74
lean
-0.69
aq
-0.67
oses
-0.66
gur
-0.63
ax
-0.59
´
-0.59
aukee
-0.59
le
-0.59
respect
-0.59
POSITIVE LOGITS
they
0.99
THEY
0.92
there
0.92
soever
0.91
we
0.88
nobody
0.86
unlike
0.81
*/(
0.80
it
0.78
although
0.78
Activations Density 0.159%