INDEX
Explanations
verbs indicating the ability to perform an action or the impossibility of an action
phrases expressing limitations or prohibitions
New Auto-Interp
Negative Logits
rongh
-0.62
assorted
-0.60
leted
-0.58
ghan
-0.58
mull
-0.57
rather
-0.56
rather
-0.56
unsurprisingly
-0.56
ischer
-0.55
JO
-0.54
POSITIVE LOGITS
anymore
1.39
nor
1.06
unless
0.89
anything
0.88
urate
0.85
adequately
0.85
anywhere
0.79
anybody
0.79
coherent
0.78
satisf
0.78
Activations Density 0.352%