INDEX
Explanations
phrases expressing evaluations or judgments, particularly using intensity modifiers
New Auto-Interp
Negative Logits
ulumi
-0.18
Ñıг
-0.16
swers
-0.16
ownik
-0.16
itti
-0.16
ãĥĹ
-0.15
awei
-0.15
ุà¸Ĺà¸ĺ
-0.14
DidChange
-0.14
fusc
-0.14
POSITIVE LOGITS
glad
0.22
impressed
0.20
sure
0.19
tempted
0.18
worth
0.17
lots
0.17
pleased
0.17
convinced
0.16
true
0.15
onth
0.15
Activations Density 0.238%