INDEX
Explanations
phrases related to expressing personal preferences or opinions
expressions of recommendation or personal opinion
New Auto-Interp
Negative Logits
glimps
-0.65
Definitions
-0.63
Detail
-0.62
Contains
-0.61
revelations
-0.61
facts
-0.61
Investigative
-0.61
constit
-0.60
contained
-0.58
unforeseen
-0.58
POSITIVE LOGITS
recommend
1.15
prefer
0.98
recommending
0.95
gladly
0.94
bet
0.94
hesitate
0.92
endorse
0.88
laughs
0.83
urge
0.82
votes
0.82
Activations Density 0.251%