INDEX
    Explanations

    references to wolves and wolf-related characters or themes

    New Auto-Interp
    Negative Logits
     Kew
    -0.83
     sth
    -0.81
     Eud
    -0.79
     Shet
    -0.77
     Réponses
    -0.77
    •••
    -0.77
     Bany
    -0.77
    νώ
    -0.77
     Jwt
    -0.76
     Edna
    -0.76
    POSITIVE LOGITS
     Wolf
    1.84
     WOLF
    1.67
    wolf
    1.60
    Wolf
    1.59
     wolf
    1.55
     Wolfe
    1.45
    Wolves
    1.44
     Wolves
    1.40
    wolves
    1.34
     wolves
    1.33
    Act Density 0.012%

    No Known Activations