INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pauvreté
    -0.45
    MessageOf
    -0.43
     hâte
    -0.42
     Jerusalén
    -0.41
     fiabilidad
    -0.40
    berdayakan
    -0.40
     toalha
    -0.40
     vectorielle
    -0.40
    circledR
    -0.40
     Mackey
    -0.39
    POSITIVE LOGITS
    inh
    2.41
    INH
    1.73
    nh
    1.45
    anh
    1.41
    inha
    1.11
    inho
    1.10
    Nh
    0.94
    NH
    0.91
    ynh
    0.90
    Anh
    0.89
    Act Density 0.010%

    No Known Activations