INDEX
    Explanations

    expressions of surprise or realization

    New Auto-Interp
    Negative Logits
    vett
    -0.50
     Virginie
    -0.45
     Perse
    -0.45
     Mette
    -0.44
    cupine
    -0.43
    ccb
    -0.43
    mbangan
    -0.42
     føl
    -0.42
    veral
    -0.42
    ellen
    -0.42
    POSITIVE LOGITS
     Ah
    0.71
    Ah
    0.68
     ApJ
    0.58
    Ach
    0.57
     Ach
    0.54
     ah
    0.54
     AH
    0.51
    0.50
     Ahl
    0.50
    ah
    0.50
    Act Density 0.011%

    No Known Activations