INDEX
Explanations
phrases indicating a comparison or specification of groups
New Auto-Interp
Negative Logits
<bos>
-0.62
-0.56
-0.44
AuthGuard
-0.43
↵↵
-0.41
-
-0.41
kr
-0.40
-0.40
b
-0.39
cing
-0.38
POSITIVE LOGITS
AMONG
1.24
Amongst
1.16
Among
1.10
CreateTagHelper
1.03
Among
1.03
amongst
1.01
Parmi
1.00
among
1.00
blant
0.98
Среди
0.97
Activations Density 0.100%