INDEX
Explanations
locations or places
instances of correspondents and communication-related terms in news contexts
New Auto-Interp
Negative Logits
-)
-0.91
?)
-0.79
)?
-0.74
)}
-0.73
doesnt
-0.73
?]
-0.71
meanwhile
-0.70
)</
-0.70
)))
-0.67
+)
-0.67
POSITIVE LOGITS
—"
0.70
proclaiming
0.68
frantically
0.67
.[
0.64
furiously
0.62
enthusiastically
0.62
—
0.61
.
0.60
ostensibly
0.59
abruptly
0.58
Activations Density 1.233%