INDEX
    Explanations

    words related to definitions or explanations

    phrases that define various concepts and terms

    New Auto-Interp
    Negative Logits
     outweigh
    -0.83
    umbn
    -0.73
     heels
    -0.72
    BLIC
    -0.72
    aldi
    -0.70
    ibling
    -0.69
    ersen
    -0.69
    sync
    -0.68
     applaud
    -0.67
    ÃĥÃĤ
    -0.64
    POSITIVE LOGITS
     CoC
    0.80
     boundaries
    0.79
     thresholds
    0.72
     defining
    0.71
    initions
    0.71
    Characters
    0.70
     meaning
    0.69
     definitions
    0.68
     Species
    0.68
    Category
    0.67
    Act Density 0.164%

    No Known Activations