INDEX
Explanations
phrases related to time progression
phrases indicating subjective evaluations or opinions
New Auto-Interp
Negative Logits
ãĥ¼ãĥĨãĤ£
-0.73
interstitial
-0.70
èĪ
-0.66
ij士
-0.63
ãĥĹ
-0.62
ãĤ£
-0.62
ĸļ士
-0.61
©¶æ
-0.60
¿½
-0.59
ãĥ©ãĥ³
-0.58
POSITIVE LOGITS
haha
1.47
;)
1.46
:)
1.44
anyways
1.41
lol
1.37
tho
1.33
:(
1.33
:-)
1.25
)?
1.19
anyway
1.18
Activations Density 0.689%