INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    י
    0.89
    ד
    0.88
    শালী
    0.86
    ي
    0.86
     באתר
    0.84
     repayment
    0.80
    städter
    0.80
    i
    0.80
    ি
    0.79
     utilisez
    0.79
    POSITIVE LOGITS
    agonist
    1.02
    clockwise
    1.01
    inflammatory
    1.01
    Semitism
    0.98
    dote
    0.92
    ipasi
    0.91
    bellum
    0.91
    terrorism
    0.91
    wasm
    0.90
    submarine
    0.89
    Act Density 0.100%

    No Known Activations