INDEX
Explanations
dates and events like games, attacks, or sentences
New Auto-Interp
Negative Logits
laim
-0.79
anan
-0.70
cientious
-0.69
etimes
-0.69
*/(
-0.68
intend
-0.66
kef
-0.65
cules
-0.63
geries
-0.63
certain
-0.63
POSITIVE LOGITS
east
0.76
onwards
0.65
onward
0.63
04
0.63
coasts
0.61
drills
0.59
lows
0.59
arrives
0.59
004
0.59
Tokens
0.58
Activations Density 0.121%