INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     friendships
    -0.07
     vert
    -0.06
     \(
    -0.06
     negotiations
    -0.06
    _ROOT
    -0.06
    adaki
    -0.06
     reimb
    -0.06
     refugee
    -0.06
    われ
    -0.06
     Про
    -0.06
    POSITIVE LOGITS
    outside
    0.07
     eyeb
    0.07
    /go
    0.07
    loyd
    0.06
    MessageBox
    0.06
    ایط
    0.06
     ".
    0.06
    _SY
    0.06
    อบ
    0.06
     awarded
    0.06
    Act Density 0.005%

    No Known Activations