INDEX
Explanations
descriptions or comparisons emphatically starting with "Like"
the word "Like" used in various contexts to indicate preference or approval
New Auto-Interp
Negative Logits
ennes
-0.87
enthusi
-0.81
tein
-0.79
ape
-0.75
rift
-0.74
hiba
-0.73
dies
-0.71
Americ
-0.70
aley
-0.69
duct
-0.68
POSITIVE LOGITS
lihood
1.89
liest
1.28
lier
1.15
liness
0.97
minded
0.88
ours
0.83
minded
0.82
ly
0.81
clock
0.74
ably
0.72
Activations Density 0.050%