INDEX
Explanations
modal verbs or phrases indicating impossibility or necessity
negation or phrases indicating impossibility
New Auto-Interp
Negative Logits
ahime
-0.58
CI
-0.57
iens
-0.56
Cutter
-0.55
gat
-0.55
imity
-0.54
inspecting
-0.53
Gat
-0.51
naming
-0.51
Person
-0.51
POSITIVE LOGITS
be
1.19
possibly
1.12
withstand
0.98
feas
0.97
happen
0.94
plaus
0.92
exist
0.92
possibly
0.89
occur
0.89
Possibly
0.86
Activations Density 0.108%