INDEX
    Explanations

    start of new sentences or phrases

    New Auto-Interp
    Negative Logits
     multiplicative
    0.38
     mollus
    0.37
     logits
    0.37
     dilution
    0.36
     violência
    0.36
    0.36
     hadron
    0.36
     sparsity
    0.35
     carbonyl
    0.35
     metac
    0.35
    POSITIVE LOGITS
    eny
    0.44
    iti
    0.44
    olic
    0.44
    ulaire
    0.41
    ela
    0.40
    way
    0.40
    ru
    0.40
    ue
    0.39
     Award
    0.39
    archive
    0.39
    Act Density 0.000%

    No Known Activations