INDEX
Explanations
phrases related to permission or capability
modal verbs indicating capability or possibility
New Auto-Interp
Negative Logits
insofar
-0.68
Mant
-0.61
danger
-0.59
Choice
-0.59
Cheong
-0.58
Hits
-0.57
hates
-0.57
Moz
-0.57
irony
-0.57
repre
-0.57
POSITIVE LOGITS
't
1.32
afford
1.03
isters
0.92
continue
0.91
communicate
0.90
cope
0.89
ister
0.87
fend
0.87
begin
0.85
satisfy
0.85
Activations Density 0.172%