INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    similar
    -1.41
     similar
    -1.37
    Similar
    -1.30
     Similar
    -1.24
     similaire
    -1.16
     SIMILAR
    -1.14
     analogous
    -1.09
     ähnlich
    -1.09
     equivalent
    -1.09
    equivalent
    -1.05
    POSITIVE LOGITS
    ily
    0.51
     to
    0.47
    OfClass
    0.47
    ly
    0.46
    ij
    0.46
    mehr
    0.45
     polity
    0.45
     mellom
    0.45
    ist
    0.45
     enough
    0.45
    Act Density 0.078%

    No Known Activations