INDEX
Explanations
predictions or possibilities
modal verbs indicating possibility or uncertainty
New Auto-Interp
Negative Logits
perty
-0.67
akers
-0.66
rency
-0.64
Enhancement
-0.63
usha
-0.63
avis
-0.58
ologue
-0.58
Named
-0.58
performing
-0.58
vis
-0.58
POSITIVE LOGITS
happen
1.25
mean
1.22
imply
1.01
entail
1.00
translate
0.99
explain
0.98
mean
0.98
horr
0.92
result
0.91
means
0.91
Activations Density 0.131%