INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     work
    -0.08
     thus
    -0.07
     disk
    -0.07
     skin
    -0.07
    Film
    -0.07
     lust
    -0.07
     Work
    -0.07
     Freeman
    -0.07
     fire
    -0.07
     flare
    -0.07
    POSITIVE LOGITS
     category
    0.18
     categories
    0.16
     Category
    0.12
     CATEGORY
    0.10
    categories
    0.10
    category
    0.10
     categor
    0.10
    Category
    0.10
     categoria
    0.09
     Categories
    0.09
    Act Density 0.027%

    No Known Activations