INDEX
    Explanations

    phrases indicating potential or capability

    New Auto-Interp
    Negative Logits
    ccak
    -0.15
     Reasons
    -0.15
    utom
    -0.14
    usi
    -0.14
    ³
    -0.14
    ãĥģãĥ¥
    -0.14
    hints
    -0.14
    ÏģÏī
    -0.14
     OTHERWISE
    -0.13
    SKIP
    -0.13
    POSITIVE LOGITS
     added
    0.27
     mak
    0.26
     distinction
    0.25
     potential
    0.24
     capability
    0.24
     tendency
    0.23
     advantage
    0.23
     capacity
    0.22
     same
    0.22
     distinct
    0.21
    Act Density 0.059%

    No Known Activations