INDEX
Explanations
comparisons using the word "like"
comparisons or similes
New Auto-Interp
Negative Logits
icators
-0.87
ocamp
-0.81
ulty
-0.80
ourse
-0.79
Published
-0.78
bard
-0.77
icator
-0.75
gments
-0.74
icity
-0.74
Supported
-0.74
POSITIVE LOGITS
liest
1.01
lihood
0.99
lier
0.96
wildfire
0.71
wink
0.68
coincidence
0.67
goodbye
0.65
spitting
0.65
dé
0.65
Craigslist
0.64
Activations Density 0.038%