INDEX
    Explanations

    following introductory phrases

    New Auto-Interp
    Negative Logits
     ถ้า
    0.89
     lots
    0.77
    ถ้า
    0.77
     usually
    0.73
     خیلی
    0.73
     trying
    0.73
     sometimes
    0.72
     אבל
    0.71
     kalau
    0.71
     things
    0.70
    POSITIVE LOGITS
     Through
    0.96
    Through
    0.88
     Specifically
    0.88
     Utilizing
    0.86
    Specifically
    0.84
     Currently
    0.82
     Notably
    0.80
     Following
    0.79
     Leveraging
    0.76
     Furthermore
    0.76
    Act Density 0.064%

    No Known Activations