INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    "↵↵↵↵
    -0.10
    ='"
    -0.08
    Villa
    -0.08
     Villa
    -0.08
    ובה
    -0.07
     blo
    -0.07
    OKE
    -0.07
    .boot
    -0.07
    "↵↵↵
    -0.07
     v
    -0.07
    POSITIVE LOGITS
    ENSITIVE
    0.09
     interpreting
    0.09
     say
    0.09
     CANCEL
    0.08
     unbek
    0.08
     interpretation
    0.08
     interpret
    0.08
     interpreted
    0.08
     NORMAL
    0.08
     DEFINE
    0.08
    Act Density 0.009%

    No Known Activations