INDEX
Explanations
statements regarding the existence or presence of conditions and products, often with a focus on their quality or characteristics
New Auto-Interp
Negative Logits
certainly
-0.16
probably
-0.15
probably
-0.15
Theodore
-0.15
ufs
-0.15
uga
-0.14
Probably
-0.14
rất
-0.14
MB
-0.14
J
-0.14
POSITIVE LOGITS
вдÑĢÑĥг
0.24
yoksa
0.22
indeed
0.20
varsa
0.20
somehow
0.18
_______,
0.17
بتÙĪØ§ÙĨ
0.16
itra
0.15
truly
0.15
Indeed
0.14
Activations Density 0.117%