INDEX
Explanations
positive expressions or compliments
positive adjectives and expressions of approval
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-1.01
conservancy
-0.89
Downloadha
-0.88
obook
-0.82
velop
-0.77
à¨
-0.76
é¾į
-0.76
à©
-0.75
ãĥĺãĥ©
-0.74
ãģ®å®
-0.73
POSITIVE LOGITS
huh
0.92
gotta
0.87
Thoughts
0.82
kidding
0.82
Sounds
0.76
classy
0.76
Advice
0.74
coincidence
0.74
irony
0.73
ly
0.73
Activations Density 0.225%