INDEX
    Explanations

    phrases or sentences that describe similarities or comparisons

    comparisons and similarities between different concepts or entities

    New Auto-Interp
    Negative Logits
    ale
    -0.64
    Explore
    -0.63
    rollers
    -0.62
    own
    -0.59
     Glory
    -0.59
     Lauderdale
    -0.57
     Bild
    -0.57
    rection
    -0.57
     overfl
    -0.57
    Published
    -0.56
    POSITIVE LOGITS
    lihood
    1.01
     minded
    0.91
    worldly
    0.84
    icut
    0.84
     twins
    0.80
    minded
    0.78
    ĸļ
    0.76
    itably
    0.76
    ively
    0.74
    iated
    0.74
    Act Density 0.028%

    No Known Activations