INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ÄŁ
    -0.96
    ARGET
    -0.82
    ITAL
    -0.81
    ========
    -0.80
    ====
    -0.79
    IFA
    -0.78
    ======
    -0.77
    ORT
    -0.76
    REM
    -0.75
    ̶
    -0.73
    POSITIVE LOGITS
    hound
    1.19
    berry
    0.91
     grey
    0.87
    grey
    0.84
     wolf
    0.80
     coloured
    0.79
     shading
    0.78
    washing
    0.78
     colour
    0.78
    igans
    0.77
    Act Density 0.004%

    No Known Activations