INDEX
Explanations
phrases related to rankings or lists, especially superlative terms like "Top"
references to rankings or lists
New Auto-Interp
Negative Logits
merce
-0.84
issance
-0.71
hanged
-0.69
pci
-0.69
peacefully
-0.67
ually
-0.65
¯¯
-0.64
URA
-0.64
éĹ
-0.64
urion
-0.64
POSITIVE LOGITS
Top
3.57
Top
2.69
TOP
2.15
top
2.13
top
2.08
Bottom
1.97
TOP
1.78
Bottom
1.45
tops
1.43
tops
1.42
Activations Density 0.013%