INDEX
    Explanations

    references to events or actions related to personal experiences

    New Auto-Interp
    Negative Logits
     ſeyn
    -0.64
    iſche
    -0.63
    <unused74>
    -0.62
    <unused42>
    -0.62
    𑄮
    -0.62
    <unused79>
    -0.62
    <unused14>
    -0.62
    <unused8>
    -0.61
    <unused3>
    -0.61
    [@BOS@]
    -0.61
    POSITIVE LOGITS
     ActiveRecord
    0.47
     những
    0.41
    vedať
    0.38
     vertelt
    0.36
    strze
    0.35
     các
    0.35
    ParallelGroup
    0.34
    WriteTagHelper
    0.33
     Nederlandse
    0.33
    ükemmel
    0.31
    Act Density 0.010%

    No Known Activations