INDEX
    Explanations

    references to visual or descriptive features related to shape or outline

    New Auto-Interp
    Negative Logits
    ähr
    -0.18
    ebi
    -0.18
    odable
    -0.16
    eyse
    -0.15
    sport
    -0.15
    meteor
    -0.15
    riott
    -0.14
     squat
    -0.14
    infeld
    -0.13
    erap
    -0.13
    POSITIVE LOGITS
    659
    0.15
    istrovstvÃŃ
    0.14
    ingham
    0.14
    ettle
    0.14
    986
    0.14
    -assets
    0.13
    -Ta
    0.13
    ously
    0.13
    кÑĢеÑĤ
    0.13
    fig
    0.13
    Act Density 0.008%

    No Known Activations