INDEX
Explanations
phrases indicating knowledge or reaction to information
negations or phrases indicating what something is not
New Auto-Interp
Negative Logits
umbnail
-0.71
ourses
-0.70
çļ
-0.70
former
-0.70
oided
-0.66
papers
-0.63
WAY
-0.63
åº
-0.62
send
-0.60
ixel
-0.60
POSITIVE LOGITS
icable
1.12
uncommon
1.12
easy
1.06
necessarily
1.05
advisable
1.01
raining
1.01
impossible
0.95
eworthy
0.95
feasible
0.92
surprising
0.92
Activations Density 0.100%