INDEX
Explanations
mentions of prestigious awards, especially the Nobel Peace Prize
terms related to prestigious awards, specifically the Nobel Prize and Pulitzer Prize
New Auto-Interp
Negative Logits
avage
-0.73
aves
-0.67
oute
-0.66
addock
-0.64
Antar
-0.62
Discord
-0.62
ickr
-0.62
icago
-0.62
yy
-0.62
mson
-0.62
POSITIVE LOGITS
Prize
1.46
laureate
1.24
laure
1.07
prize
0.98
Nobel
0.95
Pri
0.94
Laure
0.91
Nob
0.90
Peace
0.83
medal
0.81
Activations Density 0.021%