INDEX
Explanations
details about things that people like or enjoy
instances of the word "likes."
New Auto-Interp
Negative Logits
§
-0.68
borne
-0.64
examination
-0.64
ANCE
-0.63
imony
-0.61
mit
-0.61
athon
-0.59
Impl
-0.58
sequence
-0.58
AMERICA
-0.58
POSITIVE LOGITS
likes
3.91
Likes
1.82
loves
1.62
liked
1.56
prefers
1.49
hates
1.38
liking
1.37
wants
1.27
favourites
1.20
enjoys
1.12
Activations Density 0.012%