INDEX
Explanations
phrases expressing disagreement or disillusionment
statements that highlight negation or contradiction regarding expectations or beliefs
New Auto-Interp
Negative Logits
redes
-0.78
çīĪ
-0.69
ACTIONS
-0.66
asionally
-0.63
periodically
-0.62
YES
-0.61
å§«
-0.61
margins
-0.61
unsus
-0.60
)))
-0.59
POSITIVE LOGITS
anymore
1.72
nor
1.34
yet
1.09
necessarily
0.98
slightest
0.90
anybody
0.85
anywhere
0.83
yet
0.81
necess
0.78
any
0.78
Activations Density 0.316%