INDEX
Explanations
pronouns and possessive pronouns
the end-of-text token
New Auto-Interp
Negative Logits
Niet
-0.68
Frie
-0.65
Thompson
-0.62
Dres
-0.62
Seym
-0.61
å§
-0.61
Kinnikuman
-0.60
Stevenson
-0.58
Keefe
-0.58
Schwar
-0.58
POSITIVE LOGITS
sqor
0.73
quickShipAvailable
0.73
][
0.68
favourite
0.65
favorite
0.63
OTOS
0.63
actionDate
0.63
osponsors
0.60
VIDEOS
0.59
cigar
0.59
Activations Density 0.053%