INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Neville
    -0.08
     Miche
    -0.07
     christmas
    -0.07
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    -0.07
     Yue
    -0.07
    template
    -0.07
    June
    -0.06
     Augustine
    -0.06
     witches
    -0.06
    oley
    -0.06
    POSITIVE LOGITS
     far
    0.16
     Far
    0.14
    Far
    0.13
     FAR
    0.11
    far
    0.11
     farther
    0.10
    _far
    0.08
    AR
    0.08
     Farrell
    0.08
     afar
    0.08
    Act Density 0.014%

    No Known Activations