INDEX
Explanations
references to Wikipedia
mentions of Wikipedia and related references
New Auto-Interp
Negative Logits
uncond
-0.69
atch
-0.65
pressed
-0.65
hea
-0.64
festive
-0.61
erc
-0.60
tal
-0.60
beads
-0.59
ebted
-0.59
icip
-0.59
POSITIVE LOGITS
Wikipedia
3.87
Wikipedia
3.27
wik
2.57
Wikimedia
2.52
wikipedia
2.51
Wik
2.05
ipedia
1.99
Wiki
1.98
wiki
1.92
Wik
1.91
Activations Density 0.020%