INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Far
    -0.08
    visitor
    -0.07
    OV
    -0.07
     poj
    -0.07
     Deep
    -0.06
    ocular
    -0.06
     racer
    -0.06
     Oscar
    -0.06
     удар
    -0.06
     Conor
    -0.06
    POSITIVE LOGITS
     meth
    0.18
     Meth
    0.15
    meth
    0.13
    eth
    0.10
     METH
    0.09
     Beth
    0.09
    ETH
    0.08
    Beth
    0.08
     policymakers
    0.07
     tying
    0.07
    Act Density 0.010%

    No Known Activations