INDEX
Explanations
mentions of Donald Trump
New Auto-Interp
Negative Logits
ipo
-0.18
geber
-0.16
enne
-0.16
rics
-0.15
gers
-0.15
amba
-0.15
ighbors
-0.14
Guinness
-0.14
cciones
-0.14
emplate
-0.14
POSITIVE LOGITS
aldi
0.18
eter
0.17
enstein
0.16
ian
0.16
ster
0.15
Nat
0.15
indle
0.15
ald
0.15
itler
0.15
nat
0.15
Activations Density 0.016%