INDEX
Explanations
phrases related to abilities or actions
phrases related to limitations and inability
New Auto-Interp
Negative Logits
ghan
-0.64
naire
-0.63
ollah
-0.63
leted
-0.62
rongh
-0.59
mull
-0.58
rather
-0.57
ortment
-0.57
rather
-0.56
igl
-0.55
POSITIVE LOGITS
anymore
1.37
nor
0.95
anything
0.91
urate
0.86
uate
0.84
any
0.81
anybody
0.81
anywhere
0.76
adequately
0.75
anyone
0.74
Activations Density 0.200%