INDEX
    Explanations

    negation or objection phrases

    New Auto-Interp
    Negative Logits
    engertian
    -0.55
    forName
    -0.53
     zirc
    -0.53
    alnız
    -0.53
     marten
    -0.53
    ORIAL
    -0.53
     torchvision
    -0.52
     GLASS
    -0.52
    orten
    -0.52
     regioni
    -0.51
    POSITIVE LOGITS
     etc
    0.92
     <=",
    0.74
    etc
    0.73
    kháu
    0.72
     للمعارف
    0.70
     Италијани
    0.67
     "..\..\
    0.66
     kasarigan
    0.64
     Administrativna
    0.61
    何より
    0.60
    Act Density 0.237%

    No Known Activations