INDEX
Explanations
specific nouns and terms related to news articles or reports
references to specific attacks, reports, or events involving the term "the" in various contexts
New Auto-Interp
Negative Logits
isphere
-0.86
thia
-0.70
distingu
-0.68
Ń·
-0.68
âĢº
-0.67
dor
-0.66
ovember
-0.65
umbered
-0.65
puted
-0.64
cliffe
-0.63
POSITIVE LOGITS
bluff
1.25
shots
0.95
kettle
0.74
hotline
0.72
Shots
0.69
situation
0.63
actions
0.63
attention
0.62
act
0.62
halt
0.61
Activations Density 0.094%