INDEX
    Explanations

    obvious or simple checks

    New Auto-Interp
    Negative Logits
     svm
    0.46
     AMAZING
    0.46
     excitedly
    0.41
     aswell
    0.41
    💗
    0.41
     seulement
    0.41
     שהוא
    0.40
    まず
    0.40
    จะต้อง
    0.40
    (:
    0.40
    POSITIVE LOGITS
     yep
    0.95
     Yep
    0.88
    Yep
    0.84
    Yeah
    0.76
    Yup
    0.76
     Yeah
    0.75
     Yup
    0.74
     yeah
    0.74
     Looks
    0.71
    yeah
    0.70
    Act Density 0.009%

    No Known Activations