INDEX
Explanations
expressions indicating clarity or certainty in arguments
New Auto-Interp
Negative Logits
ses
-0.51
ط
-0.51
MotionEvent
-0.47
Revenir
-0.47
يتيمه
-0.47
con
-0.45
<eos>
-0.45
Con
-0.45
жели
-0.45
t
-0.44
POSITIVE LOGITS
itſelf
1.09
myſelf
1.01
Efq
0.87
Jefus
0.85
purpoſe
0.82
themſelves
0.81
ſelf
0.77
raiſ
0.76
leſs
0.76
reaſon
0.73
Activations Density 0.054%