INDEX
Explanations
phrases related to personal preferences or favorites
statements that express personal preferences or favorites
New Auto-Interp
Negative Logits
uum
-0.79
iates
-0.70
urches
-0.69
iate
-0.68
ients
-0.67
verett
-0.66
merits
-0.65
ynes
-0.64
harms
-0.63
ples
-0.62
POSITIVE LOGITS
undoubtedly
1.00
definitely
0.84
probably
0.83
ometric
0.81
indeed
0.80
called
0.77
ovie
0.73
doubtless
0.73
Solitaire
0.72
abi
0.71
Activations Density 0.198%