INDEX
    Explanations

    the word "same" and its variations in different contexts

    New Auto-Interp
    Negative Logits
    lio
    -0.17
     own
    -0.17
    ses
    -0.16
    cas
    -0.15
    self
    -0.14
    ion
    -0.14
    untas
    -0.14
    rious
    -0.14
     similar
    -0.13
    aint
    -0.13
    POSITIVE LOGITS
    -sex
    0.37
     exact
    0.29
     thing
    0.29
     kind
    0.24
    exact
    0.23
    ãģı
    0.23
     sort
    0.23
    -old
    0.22
     amount
    0.22
    Exact
    0.21
    Act Density 0.053%

    No Known Activations