INDEX
Explanations
phrases that introduce lists or recommendations
New Auto-Interp
Negative Logits
ä»ķ
-0.15
ochen
-0.15
ên
-0.14
_BOTH
-0.14
icont
-0.14
ONDON
-0.13
оÑĪ
-0.13
ucks
-0.13
iston
-0.13
ichen
-0.13
POSITIVE LOGITS
some
0.58
some
0.46
Some
0.43
Some
0.41
SOME
0.41
ä¸ĢäºĽ
0.40
einige
0.36
.some
0.34
_some
0.34
quelques
0.33
Activations Density 0.170%