INDEX
    Explanations

    verbs indicating actions, processes, or changes

    New Auto-Interp
    Negative Logits
    المشاركات
    -0.61
    Couldn
    -0.57
    <eos>
    -0.56
    '
    -0.56
    -
    -0.53
     couldn
    -0.52
     would
    -0.49
    which
    -0.48
     was
    -0.47
    would
    -0.46
    POSITIVE LOGITS
     fhew
    1.01
     Theſe
    0.99
    AddTagHelper
    0.98
     themſelves
    0.95
     chofe
    0.92
     uſe
    0.91
     becauſe
    0.90
     foncé
    0.90
     myſelf
    0.89
     ſtand
    0.86
    Act Density 0.436%

    No Known Activations