INDEX
Explanations
phrases related to the ability to do something
expressions of capability or possibility
New Auto-Interp
Negative Logits
furt
-0.76
Irving
-0.70
revision
-0.69
rejection
-0.66
Strikes
-0.63
Cance
-0.62
UR
-0.61
Likes
-0.61
Uri
-0.58
IB
-0.57
POSITIVE LOGITS
't
1.45
berra
1.19
adian
1.12
afford
1.01
NOT
0.96
muster
0.89
feas
0.87
nery
0.85
ieve
0.85
be
0.80
Activations Density 0.170%