INDEX
Explanations
text related to quantifying amounts or levels
phrases related to categorization or classification
New Auto-Interp
Negative Logits
partName
-0.61
arton
-0.60
USS
-0.55
gov
-0.53
Adren
-0.52
Journalism
-0.52
Stras
-0.51
HuffPost
-0.51
YN
-0.50
atform
-0.50
POSITIVE LOGITS
equivalents
0.70
".[
0.69
apiece
0.69
destro
0.68
.",
0.66
.''.
0.65
whereas
0.65
disadvant
0.64
."[
0.63
!".
0.61
Activations Density 1.970%