INDEX
Explanations
mentions of numbers at the beginning of phrases, potentially related to rankings or quantities
expressions of strong opinions or sentiments
New Auto-Interp
Negative Logits
economic
-0.63
Government
-0.63
Welfare
-0.62
welfare
-0.62
administr
-0.61
preventive
-0.60
withdrawing
-0.60
lawful
-0.59
unlawfully
-0.59
Employ
-0.59
POSITIVE LOGITS
cinematic
0.80
sequels
0.76
Collider
0.74
hilar
0.73
cameo
0.73
teased
0.73
soundtrack
0.72
laughs
0.71
anthology
0.70
premie
0.70
Activations Density 3.160%