INDEX
    Explanations

    first, important, note, before

    New Auto-Interp
    Negative Logits
     opposit
    0.48
     niezb
    0.44
     negated
    0.41
     respectivement
    0.39
    少なくとも
    0.39
     exces
    0.38
    あとは
    0.38
    ৃতা
    0.37
     ተመሳሳይ
    0.37
     alternatively
    0.36
    POSITIVE LOGITS
    前提
    0.81
    首先
    0.78
    NOTE
    0.76
    ก่อน
    0.76
    Abbreviations
    0.75
     Before
    0.74
     caveat
    0.74
     caveats
    0.74
     NOTE
    0.73
     Disclaimer
    0.73
    Act Density 0.024%

    No Known Activations