INDEX
Explanations
self-referential phrases signaling the speaker's thoughts or actions
first-person perspective statements and expressions of personal thoughts or feelings
New Auto-Interp
Negative Logits
adra
-0.71
externalActionCode
-0.70
fig
-0.67
021
-0.64
×Ļ
-0.62
otiation
-0.60
WAR
-0.59
Nanto
-0.59
edition
-0.59
è£ıè¦ļéĨĴ
-0.58
POSITIVE LOGITS
joking
0.90
invincible
0.84
kidding
0.77
kindred
0.69
innocuous
0.65
might
0.64
might
0.62
amn
0.61
ļé
0.61
'd
0.61
Activations Density 0.169%