INDEX
Explanations
phrases indicating something is not supported, compatible, allowed, reflected, or directly executable
negations related to compatibility and functionality
New Auto-Interp
Negative Logits
love
-0.71
iency
-0.69
Topics
-0.68
Fol
-0.68
Nice
-0.67
papers
-0.66
Love
-0.66
haven
-0.66
GRE
-0.66
little
-0.65
POSITIVE LOGITS
necessarily
1.12
affected
1.03
permitted
1.02
applicable
0.94
included
0.92
allowed
0.92
icable
0.91
affect
0.90
guaranteed
0.87
counted
0.87
Activations Density 0.205%