INDEX
    Explanations

    comparisons using the word "like"

    New Auto-Interp
    Negative Logits
    icators
    -0.87
    ocamp
    -0.81
    ulty
    -0.80
    ourse
    -0.79
    Published
    -0.78
    bard
    -0.77
    icator
    -0.75
    gments
    -0.74
    icity
    -0.74
    Supported
    -0.74
    POSITIVE LOGITS
    liest
    1.01
    lihood
    0.99
    lier
    0.96
     wildfire
    0.71
     wink
    0.68
     coincidence
    0.67
     goodbye
    0.65
     spitting
    0.65
     dé
    0.65
     Craigslist
    0.64
    Act Density 0.038%

    No Known Activations