INDEX
    Explanations

    mentions of locations or specific names

    New Auto-Interp
    Negative Logits
    sburgh
    -0.90
     wrench
    -0.77
     Kinn
    -0.65
    ually
    -0.64
    vironment
    -0.64
     Takeru
    -0.63
    ettings
    -0.61
    urally
    -0.60
    ysis
    -0.59
    İĭ
    -0.59
    POSITIVE LOGITS
    aday
    1.09
    riers
    1.06
    ouk
    1.04
    rier
    1.02
    thing
    0.98
    rer
    0.92
    agher
    0.89
    rug
    0.85
    bent
    0.84
    rak
    0.83
    Act Density 0.014%

    No Known Activations