INDEX
    Explanations

    punctuation marks, particularly various quotation marks or apostrophes used in speech or dialogue

    New Auto-Interp
    Negative Logits
     becauſe
    -0.61
     kvinder
    -0.60
    respectively
    -0.60
     bolig
    -0.59
     problemer
    -0.58
     nemlig
    -0.58
     applicazioni
    -0.58
     dramatist
    -0.58
     stället
    -0.57
    arbeid
    -0.57
    POSITIVE LOGITS
    Autoritní
    0.87
     normal
    0.69
     real
    0.66
    normal
    0.66
    hood
    0.63
    ifs
    0.63
    hot
    0.62
    real
    0.62
     jadx
    0.61
     hard
    0.61
    Act Density 0.158%

    No Known Activations