INDEX
    Explanations

    references to actions or interactions among characters

    New Auto-Interp
    Negative Logits
     ir
    -0.20
     isl
    -0.18
    isl
    -0.18
     iris
    -0.18
     iv
    -0.17
    ir
    -0.17
     iphone
    -0.16
     island
    -0.16
     inter
    -0.16
     iz
    -0.16
    POSITIVE LOGITS
     ãĤ¤
    0.37
     Ind
    0.37
     Im
    0.36
     Ðĺн
    0.34
     Ins
    0.32
    Im
    0.32
     Ing
    0.31
    Ind
    0.31
     Ill
    0.31
     ÐĨн
    0.31
    Act Density 0.083%

    No Known Activations