INDEX
Explanations
verbs and phrases indicating reasoning or justification
New Auto-Interp
Negative Logits
ãģĸ
-0.16
uma
-0.16
oir
-0.15
çģ
-0.14
æĮ¯ãĤĬ
-0.14
ÑĤоÑĩ
-0.14
ziel
-0.14
ãĥ³ãĥĨãĤ£
-0.14
æ¾
-0.14
OrNil
-0.14
POSITIVE LOGITS
sense
0.96
Sense
0.78
sense
0.76
Sense
0.69
senses
0.59
sentido
0.59
sensed
0.40
sens
0.38
ense
0.37
SEN
0.33
Activations Density 0.026%