INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     guerra
    -0.07
    *T
    -0.07
    bere
    -0.06
    pto
    -0.06
    War
    -0.06
     awaken
    -0.06
     हमल
    -0.06
     accommodation
    -0.06
     cautioned
    -0.06
     채용
    -0.06
    POSITIVE LOGITS
     believable
    0.07
     inflation
    0.07
     classifier
    0.06
    Li
    0.06
     trade
    0.06
     अज
    0.06
     regulation
    0.06
    材料
    0.06
    pecting
    0.06
    imit
    0.06
    Act Density 0.001%

    No Known Activations