INDEX
Explanations
instances where an action or modification can be made
phrases indicating capability or possibility
New Auto-Interp
Negative Logits
pires
-0.68
Hits
-0.68
burgh
-0.66
Mant
-0.65
Strikes
-0.65
Appears
-0.63
arthed
-0.62
favors
-0.62
forthcoming
-0.62
Attention
-0.62
POSITIVE LOGITS
't
1.55
NOT
1.16
choose
1.00
customize
0.99
find
0.96
optionally
0.95
expect
0.94
learn
0.91
berra
0.91
ister
0.91
Activations Density 0.104%