INDEX
    Explanations

    words related to understanding or comprehension

    New Auto-Interp
    Negative Logits
    allis
    -0.16
    IENT
    -0.16
    occ
    -0.15
    inaire
    -0.15
    isVisible
    -0.14
    tlement
    -0.14
    alleries
    -0.14
    igure
    -0.14
    utos
    -0.14
    er
    -0.13
    POSITIVE LOGITS
    ensively
    0.42
    ension
    0.41
    ensions
    0.41
    ensible
    0.40
    ending
    0.39
    ensive
    0.37
    ensi
    0.36
    ended
    0.35
    ens
    0.30
    ends
    0.30
    Act Density 0.012%

    No Known Activations