INDEX
    Explanations

    patterns of comparison and equivalence in descriptions

    New Auto-Interp
    Negative Logits
     similarly
    -0.87
    similar
    -0.77
    Similarly
    -0.75
     Similarly
    -0.74
     Similar
    -0.72
    Similar
    -0.69
     similar
    -0.68
    Viited
    -0.67
     SIMILAR
    -0.64
     calendriers
    -0.64
    POSITIVE LOGITS
     exact
    0.86
     ſame
    0.82
     sane
    0.82
     sam
    0.82
     self
    0.79
     же
    0.78
     zelf
    0.78
     sae
    0.77
     samym
    0.76
     saine
    0.75
    Act Density 0.145%

    No Known Activations