INDEX
Explanations
phrases indicating focus on a specific concept or topic
instances of the word "this"
New Auto-Interp
Negative Logits
ãĤ¹ãĥĪ
-0.97
è£ıè¦ļéĨĴ
-0.74
omo
-0.70
ãĥĭ
-0.68
ãĥı
-0.68
ivas
-0.67
Īè
-0.67
ielding
-0.67
berman
-0.66
umer
-0.66
POSITIVE LOGITS
week
0.88
guy
0.87
trope
0.83
madness
0.81
morning
0.79
incarnation
0.78
sucker
0.77
kind
0.77
weekend
0.77
installment
0.76
Activations Density 0.310%