INDEX
Explanations
mentions of the Nobel Prize
New Auto-Interp
Negative Logits
erde
-0.15
yor
-0.15
uner
-0.14
iscard
-0.14
yx
-0.14
Gill
-0.14
erne
-0.14
vrier
-0.14
usch
-0.14
Cran
-0.14
POSITIVE LOGITS
Prize
0.30
prize
0.28
Laure
0.22
laure
0.22
-winning
0.22
Peace
0.21
prizes
0.20
award
0.20
awarded
0.20
Nobel
0.20
Activations Density 0.002%