INDEX
Explanations
phrases containing the word "the" followed by a specific word.
phrases that indicate comparison or contrast involving the word "the."
New Auto-Interp
Negative Logits
iband
-0.73
frey
-0.72
Became
-0.70
respectively
-0.70
Accessed
-0.69
anew
-0.66
isin
-0.66
apiece
-0.65
fw
-0.64
exceeded
-0.62
POSITIVE LOGITS
rest
1.40
ones
1.24
originals
1.17
usual
1.14
others
1.14
original
1.09
previous
1.09
likes
1.06
aforementioned
1.02
norm
1.02
Activations Density 0.187%