INDEX
    Explanations

    countries and their actions

    New Auto-Interp
    Negative Logits
    u
    0.49
    ed
    0.46
    lardan
    0.44
    larda
    0.42
    ın
    0.41
    ين
    0.40
    0.40
    dır
    0.39
    as
    0.39
    ?
    0.39
    POSITIVE LOGITS
     to
    0.38
     professores
    0.38
    0.37
    ۰
    0.33
    ("
    0.33
     दोन
    0.32
    ACCOUNT
    0.32
     (
    0.32
     investigadores
    0.32
     it
    0.31
    Act Density 0.264%

    No Known Activations