INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rande
    -0.68
     للاسماء
    -0.66
     Loma
    -0.63
     berço
    -0.60
    onde
    -0.58
    tól
    -0.57
     necessárias
    -0.56
     Kendal
    -0.56
     Rande
    -0.55
    tale
    -0.55
    POSITIVE LOGITS
    Fish
    1.14
     Fish
    1.11
    fish
    1.05
     fish
    1.03
     FISH
    1.01
    FISH
    1.00
     Fisch
    0.80
     poissons
    0.77
     fished
    0.76
     fishes
    0.75
    Act Density 0.005%

    No Known Activations