INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     パネル
    -0.92
     Bt
    -0.89
     beträ
    -0.84
    وسی
    -0.81
     одном
    -0.81
     върху
    -0.80
     credibility
    -0.80
     явля
    -0.80
    𝜔
    -0.78
    關於
    -0.78
    POSITIVE LOGITS
     thankful
    1.70
     grateful
    1.51
     thanking
    1.48
     thanked
    1.38
     for
    1.37
     agradec
    1.34
     Thanksgiving
    1.30
     Thanks
    1.21
     gratitude
    1.20
     agrade
    1.20
    Act Density 0.005%

    No Known Activations