INDEX
Explanations
phrases related to enabling actions or functionalities
phrases indicating the functionality or capabilities of tools and applications
New Auto-Interp
Negative Logits
boy
-0.79
ta
-0.71
borough
-0.70
bil
-0.67
wa
-0.67
xon
-0.64
source
-0.64
bons
-0.64
tone
-0.60
town
-0.59
POSITIVE LOGITS
Reviewer
0.98
geries
0.90
Allows
0.83
uces
0.77
us
0.77
hift
0.73
withdrawals
0.72
ibaba
0.72
bidden
0.71
ences
0.71
Activations Density 0.043%