INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     under
    -0.89
     side
    -0.75
     that
    -0.73
    ohyd
    -0.73
     of
    -0.73
     abdomen
    -0.71
     nicht
    -0.70
     نمی
    -0.70
    ghter
    -0.69
    永恒
    -0.69
    POSITIVE LOGITS
     Told
    1.03
     told
    0.98
    Told
    0.79
    TRUST
    0.77
    hängen
    0.77
    THEORY
    0.76
    Escola
    0.75
     Iva
    0.74
     iva
    0.74
    zwungen
    0.74
    Act Density 0.040%

    No Known Activations