INDEX
    Explanations

    references to categories in a structured format

    New Auto-Interp
    Negative Logits
     "'");
    -0.84
     ''
    
    -0.79
    }))
    
    -0.69
     [],
    
    -0.69
    </>
    
    -0.69
    ")));
    
    -0.68
     {},
    
    -0.67
    ]))
    
    -0.66
    )");
    
    -0.65
     '',
    
    -0.64
    POSITIVE LOGITS
     category
    3.49
     categories
    3.15
     Category
    2.98
    category
    2.87
     CATEGORY
    2.74
     Categories
    2.73
    Category
    2.71
    categories
    2.67
    CATEGORY
    2.46
    Categories
    2.45
    Act Density 0.110%

    No Known Activations