INDEX
Explanations
expressions of affection or interest in a positive manner
expressions of desire or affection for something
New Auto-Interp
Negative Logits
ongh
-0.78
obs
-0.62
utical
-0.61
ritical
-0.60
okes
-0.59
VER
-0.58
quartered
-0.58
arij
-0.58
arding
-0.57
advant
-0.57
POSITIVE LOGITS
someday
1.05
ĺħ
0.83
anytime
0.70
morrow
0.66
Logged
0.64
lett
0.63
eday
0.63
leased
0.60
feedback
0.60
if
0.59
Activations Density 0.184%