INDEX
    Explanations

    the word "only" in various contexts

    New Auto-Interp
    Negative Logits
    aly
    -0.16
    spÄĽ
    -0.15
    اÙĦÙī
    -0.15
    æĥij
    -0.15
    ERM
    -0.15
    _categorical
    -0.15
    arges
    -0.14
    ább
    -0.14
    ophon
    -0.14
    arget
    -0.14
    POSITIVE LOGITS
     thing
    0.22
     rarely
    0.20
     few
    0.20
     recently
    0.18
     when
    0.18
     after
    0.17
     fools
    0.16
     Thing
    0.16
     problem
    0.16
    when
    0.16
    Act Density 0.023%

    No Known Activations