INDEX
Explanations
phrases that include the word "note" or "that."
New Auto-Interp
Negative Logits
nett
-0.16
arters
-0.14
ye
-0.14
XB
-0.14
rag
-0.14
Hacker
-0.14
nant
-0.14
ucht
-0.13
orate
-0.13
ickers
-0.13
POSITIVE LOGITS
uyen
0.15
unlike
0.15
quam
0.14
ay
0.14
å½
0.14
æŀ
0.14
ikel
0.14
ENCHMARK
0.13
ATEGORY
0.13
ERY
0.13
Activations Density 0.038%