INDEX
Explanations
phrases indicating similarity or comparison
comparisons expressing similarity
New Auto-Interp
Negative Logits
alez
-0.93
ourse
-0.87
ESA
-0.83
Ö¼
-0.83
isexual
-0.82
inion
-0.80
onding
-0.78
alt
-0.76
Cause
-0.75
ipolar
-0.74
POSITIVE LOGITS
lier
1.09
lihood
1.02
liest
0.96
Andromeda
0.74
crap
0.72
gib
0.70
flame
0.69
Carth
0.66
lifeless
0.66
liness
0.66
Activations Density 0.033%