INDEX
Explanations
phrases related to desires or interests
expressions of desire and willingness
New Auto-Interp
Negative Logits
ummies
-0.78
rooms
-0.70
eworks
-0.67
ules
-0.66
Featured
-0.66
IDs
-0.65
elines
-0.65
units
-0.65
sites
-0.64
edited
-0.64
POSITIVE LOGITS
rence
0.81
familiarity
0.79
mindset
0.70
amiliar
0.70
mentality
0.69
willingness
0.68
linkage
0.68
trove
0.68
stemming
0.67
attitude
0.67
Activations Density 0.256%