INDEX
Explanations
references to categories in a structured format
New Auto-Interp
Negative Logits
"'");
-0.84
''
-0.79
}))
-0.69
[],
-0.69
</>
-0.69
")));
-0.68
{},
-0.67
]))
-0.66
)");
-0.65
'',
-0.64
POSITIVE LOGITS
category
3.49
categories
3.15
Category
2.98
category
2.87
CATEGORY
2.74
Categories
2.73
Category
2.71
categories
2.67
CATEGORY
2.46
Categories
2.45
Activations Density 0.110%