INDEX
    Explanations

    details about things that people like or enjoy

    instances of the word "likes."

    New Auto-Interp
    Negative Logits
    §
    -0.68
     borne
    -0.64
    examination
    -0.64
    ANCE
    -0.63
    imony
    -0.61
    mit
    -0.61
    athon
    -0.59
    Impl
    -0.58
    sequence
    -0.58
     AMERICA
    -0.58
    POSITIVE LOGITS
     likes
    3.91
     Likes
    1.82
     loves
    1.62
     liked
    1.56
     prefers
    1.49
     hates
    1.38
     liking
    1.37
     wants
    1.27
     favourites
    1.20
     enjoys
    1.12
    Act Density 0.012%

    No Known Activations