INDEX
    Explanations

    comparisons using the word "like"

    instances of the word "like"

    New Auto-Interp
    Negative Logits
    ulty
    -0.87
    chin
    -0.84
    hiba
    -0.84
    inoa
    -0.77
    ourse
    -0.77
    Dispatch
    -0.76
    oard
    -0.74
    rax
    -0.73
    onte
    -0.73
    idates
    -0.71
    POSITIVE LOGITS
    lihood
    1.67
    lier
    1.02
    liest
    0.95
     ours
    0.94
     minded
    0.90
    minded
    0.90
    liness
    0.89
     wildfire
    0.80
     hers
    0.76
     yours
    0.73
    Act Density 0.118%

    No Known Activations