INDEX
Explanations
dates or time-related information
numerical values and statistics related to serious societal issues
New Auto-Interp
Negative Logits
favorably
-1.06
favor
-0.99
favored
-0.93
honors
-0.90
subsidized
-0.89
favoring
-0.88
favors
-0.87
willfully
-0.85
caliber
-0.84
flavors
-0.83
POSITIVE LOGITS
However
1.60
Labour
1.57
Speaking
1.54
Scotland
1.49
BBC
1.48
Shape
1.43
Scroll
1.43
Mr
1.41
Professor
1.41
Writing
1.41
Activations Density 0.414%