INDEX
Explanations
prepositions and phrases indicating possession or association
phrases that express comparisons or descriptors relating to characteristics or qualities
New Auto-Interp
Negative Logits
çļ
-0.77
ESE
-0.77
uers
-0.76
Fs
-0.75
HAEL
-0.75
ãĥĭ
-0.74
ULTS
-0.74
ãĥ¼ãĥĨ
-0.72
ãĤº
-0.72
senal
-0.71
POSITIVE LOGITS
screwed
0.90
messed
0.89
neat
0.89
like
0.89
ironic
0.89
fucked
0.85
weird
0.83
goofy
0.83
bum
0.81
funny
0.80
Activations Density 0.056%