INDEX
    Explanations

    conveying core statements or rules

    New Auto-Interp
    Negative Logits
    :「
    0.46
    不想
    0.45
    はじめ
    0.44
    ,「
    0.43
    !”
    0.41
    Anything
    0.41
     afterwards
    0.41
    这也是
    0.41
     nahin
    0.41
     ఇదే
    0.41
    POSITIVE LOGITS
     essentially
    0.74
     basically
    0.71
     básicamente
    0.62
    basically
    0.62
     simply
    0.60
     basicamente
    0.58
     principally
    0.57
     Basically
    0.56
    essentially
    0.56
     simplesmente
    0.55
    Act Density 0.014%

    No Known Activations