INDEX
Explanations
the word "that" in various contexts
New Auto-Interp
Negative Logits
YES
-0.07
hence
-0.06
istra
-0.06
âm
-0.06
کت
-0.06
ctors
-0.06
gend
-0.06
elic
-0.06
طبÙĤ
-0.06
och
-0.06
POSITIVE LOGITS
only
0.10
also
0.08
arda
0.07
efore
0.07
hardly
0.07
plá
0.07
weeney
0.07
mere
0.07
chá»ī
0.07
yre
0.07
Activations Density 0.020%