INDEX
Explanations
phrases that indicate expressions of agreement or acknowledgment
New Auto-Interp
Negative Logits
yš
-0.15
ãİ
-0.15
Ware
-0.14
ardin
-0.14
ãĤ¤ãĤ¯
-0.14
عÛĮ
-0.14
è¼Ŀ
-0.14
سÙĪÙĨ
-0.14
еÑī
-0.13
Ñģло
-0.13
POSITIVE LOGITS
uv
0.17
st
0.17
937
0.16
ett
0.15
interview
0.15
rou
0.14
澤
0.14
oder
0.14
lig
0.14
fellow
0.14
Activations Density 0.489%