INDEX
    Explanations

    instances of demonstrative pronouns and phrases indicating emphasis or introduction

    New Auto-Interp
    Negative Logits
    ullo
    -0.15
    Ñī
    -0.15
    apons
    -0.13
    nf
    -0.13
    ulent
    -0.13
    vr
    -0.13
     STUD
    -0.13
    inous
    -0.13
     ARGS
    -0.13
    glas
    -0.13
    POSITIVE LOGITS
     wasn
    0.20
     was
    0.18
     soon
    0.18
    _was
    0.17
     incident
    0.16
    same
    0.16
    #__
    0.16
    åĽº
    0.16
     marked
    0.15
    å½ĵçĦ¶
    0.15
    Act Density 0.119%

    No Known Activations