INDEX
Explanations
expressions and phrases indicating approval or excitement
New Auto-Interp
Negative Logits
bes
-0.18
czy
-0.15
167
-0.15
atik
-0.15
larg
-0.14
aire
-0.14
inee
-0.14
Bes
-0.14
simul
-0.14
ensor
-0.14
POSITIVE LOGITS
stuff
0.20
icular
0.17
-looking
0.16
things
0.16
cool
0.16
ÛĮÙĩ
0.15
assin
0.15
_stuff
0.15
Stuff
0.15
/use
0.15
Activations Density 0.033%