INDEX
    Explanations

    the word "like."

    expressions of preference or liking

    New Auto-Interp
    Negative Logits
    arta
    -1.00
    ilion
    -0.80
    lehem
    -0.78
    ureau
    -0.76
     enthusi
    -0.74
    PATH
    -0.73
    inas
    -0.72
    edia
    -0.71
    LM
    -0.71
    SourceFile
    -0.68
    POSITIVE LOGITS
    lihood
    1.35
    ably
    0.93
    lier
    0.84
     watching
    0.82
     seeing
    0.82
    liest
    0.79
     surprises
    0.76
    liness
    0.75
     hearing
    0.73
     spicy
    0.71
    Act Density 0.056%

    No Known Activations