INDEX
Explanations
phrases indicating admiration or strong liking for something
phrases indicating fandom or allegiance to various subjects
New Auto-Interp
Negative Logits
accounted
-0.68
dispatch
-0.64
opard
-0.64
COMPLE
-0.61
hole
-0.61
ItemImage
-0.59
cog
-0.58
?]
-0.58
BUS
-0.58
ural
-0.57
POSITIVE LOGITS
76561
0.85
sorts
0.78
irlf
0.77
ours
0.69
mire
0.68
liberty
0.65
etheless
0.65
whichever
0.64
hers
0.64
yours
0.62
Activations Density 0.078%