INDEX
Explanations
phrases indicating necessity or high importance
phrases expressing standards, recommendations, or essential items
New Auto-Interp
Negative Logits
oneliness
-0.70
ancies
-0.68
apy
-0.67
cffff
-0.66
amar
-0.65
agate
-0.64
politics
-0.63
ternity
-0.63
idine
-0.63
Sapp
-0.63
POSITIVE LOGITS
favourite
0.85
contender
0.83
favorite
0.83
recourse
0.82
motivating
0.82
recommendation
0.81
culprit
0.80
rallying
0.79
priority
0.78
Topic
0.77
Activations Density 0.366%