INDEX
Explanations
the word "like" in various contexts
New Auto-Interp
Negative Logits
jee
-0.18
habit
-0.16
bbing
-0.16
ufe
-0.15
REA
-0.15
chos
-0.14
ocache
-0.14
rieve
-0.14
iveau
-0.14
vů
-0.14
POSITIVE LOGITS
ide
0.17
ider
0.16
ento
0.15
vengeance
0.15
Pent
0.15
ides
0.14
vier
0.14
Loy
0.14
apos
0.14
Morm
0.14
Activations Density 0.013%