INDEX
Explanations
the word "like."
expressions of preference or liking
New Auto-Interp
Negative Logits
arta
-1.00
ilion
-0.80
lehem
-0.78
ureau
-0.76
enthusi
-0.74
PATH
-0.73
inas
-0.72
edia
-0.71
LM
-0.71
SourceFile
-0.68
POSITIVE LOGITS
lihood
1.35
ably
0.93
lier
0.84
watching
0.82
seeing
0.82
liest
0.79
surprises
0.76
liness
0.75
hearing
0.73
spicy
0.71
Activations Density 0.056%