INDEX
    Explanations

    references to places where people live, specifically focusing on "homes."

    New Auto-Interp
    Negative Logits
    ı
    -2.62
    Ŀ
    -2.61
    ¯
    -2.55
                                                                                 
    -2.55
    -2.55
    -2.55
    -2.55
    č↵                   
    -2.55
    -2.55
    <|outofrange|>
    -2.55
    POSITIVE LOGITS
    oque
    2.20
    creen
    2.14
    chool
    2.00
    pun
    1.99
    heet
    1.83
    weet
    1.77
    cript
    1.71
     pose
    1.69
    heets
    1.64
     offence
    1.63
    Act Density 0.082%

    No Known Activations