INDEX
    Explanations

    references to knowledge and its application or significance

    New Auto-Interp
    Negative Logits
    \<^
    -0.19
    ondheim
    -0.15
    phen
    -0.15
    az
    -0.15
    ello
    -0.15
    agli
    -0.15
    oler
    -0.14
    ross
    -0.14
    ensburg
    -0.14
    ucid
    -0.14
    POSITIVE LOGITS
     base
    0.30
    base
    0.30
    ably
    0.30
    -base
    0.25
     Base
    0.25
    able
    0.24
    gable
    0.21
     gained
    0.21
    Base
    0.20
    _base
    0.20
    Act Density 0.024%

    No Known Activations