INDEX
Explanations
phrases related to official statements or announcements
references to press and news conferences
New Auto-Interp
Negative Logits
hetti
-0.64
Honest
-0.63
Monk
-0.60
Marble
-0.60
perfected
-0.58
Ness
-0.58
Friendship
-0.58
Tavern
-0.57
Butcher
-0.57
Judd
-0.57
POSITIVE LOGITS
prise
0.88
conference
0.83
conference
0.83
query
0.81
icer
0.75
release
0.74
agency
0.73
statement
0.71
release
0.69
briefing
0.69
Activations Density 0.047%