INDEX
Explanations
phrases that include the word "sort" in reference to categorization or comparison
phrases that express classification or categorization
New Auto-Interp
Negative Logits
interrupted
-0.63
Madison
-0.62
PLIC
-0.61
PER
-0.60
VICE
-0.60
Dent
-0.59
PLE
-0.58
INT
-0.57
NZ
-0.56
VIS
-0.56
POSITIVE LOGITS
ilege
0.88
a
0.84
ies
0.83
ie
0.82
ative
0.81
liness
0.81
olith
0.79
iple
0.77
able
0.77
entially
0.76
Activations Density 0.033%