INDEX
    Explanations

    comparisons between objects or entities

    words related to resemblance or similarity

    New Auto-Interp
    Negative Logits
    ourse
    -0.79
    alloc
    -0.78
    imb
    -0.75
    arta
    -0.71
    FT
    -0.71
    load
    -0.70
    alt
    -0.70
    gard
    -0.69
    mouth
    -0.69
    deal
    -0.69
    POSITIVE LOGITS
    lihood
    1.71
    lier
    0.94
    liest
    0.85
     likeness
    0.82
     ours
    0.81
    liness
    0.78
    awei
    0.73
     resembling
    0.72
     lifeless
    0.71
     theirs
    0.70
    Act Density 0.032%

    No Known Activations