INDEX
Explanations
phrases that indicate comparisons or similarities
"Like" as a discourse marker
like or as with phrases
New Auto-Interp
Negative Logits
-0.59
-0.50
(
-0.46
↵↵
-0.44
X
-0.44
пу
-0.43
.
-0.43
Yet
-0.42
$
-0.42
yet
-0.42
POSITIVE LOGITS
CloseOperation
1.11
تقاوى
1.10
Мексичка
1.09
ViewImports
1.01
ंदीखरीदारी
0.97
تضيفلها
0.97
myſelf
0.96
wiſe
0.96
$_"
0.95
Савезне
0.95
Activations Density 0.197%