INDEX
    Explanations

    words indicating the historical or original context of something

    New Auto-Interp
    Negative Logits
     current
    -0.52
    and
    -0.51
    lle
    -0.51
    ings
    -0.50
     individual
    -0.49
     Current
    -0.48
    al
    -0.48
     czo
    -0.48
    ute
    -0.47
    ic
    -0.47
    POSITIVE LOGITS
     originally
    2.74
    Originally
    2.59
    originally
    2.45
     Originally
    2.40
     originalmente
    2.30
     ursprünglich
    1.99
     initially
    1.84
    Initially
    1.58
     inicialmente
    1.57
     Initially
    1.56
    Act Density 0.056%

    No Known Activations