INDEX
Explanations
dates and locations mentioned in news articles
New Auto-Interp
Negative Logits
Boo
-0.55
incompet
-0.54
Wee
-0.53
curls
-0.52
gore
-0.51
Grow
-0.51
dule
-0.51
gib
-0.50
Kirin
-0.50
Geh
-0.49
POSITIVE LOGITS
MEN
0.63
WASHINGTON
0.62
âĶĢ
0.62
SHARE
0.61
impl
0.60
BALL
0.60
CONT
0.60
atform
0.59
wikipedia
0.57
arbon
0.57
Activations Density 0.097%