INDEX
Explanations
descriptive phrases introducing content
the end of text tokens or represent empty content
New Auto-Interp
Negative Logits
anamo
-0.71
76561
-0.70
Izan
-0.68
onto
-0.67
nown
-0.66
adle
-0.66
aths
-0.62
witz
-0.62
omo
-0.61
ially
-0.61
POSITIVE LOGITS
week
0.88
article
0.86
month
0.80
particular
0.78
year
0.78
item
0.78
recipe
0.78
amazing
0.76
wiki
0.75
nifty
0.75
Activations Density 0.168%