INDEX
Explanations
numerical values and specific dates or ordinal indicators
New Auto-Interp
Negative Logits
favors
-0.17
Favorite
-0.17
colorful
-0.16
-percent
-0.16
eto
-0.16
honor
-0.16
Mom
-0.16
ursed
-0.15
honors
-0.15
Pron
-0.15
POSITIVE LOGITS
|_|
0.16
Formats
0.14
Guild
0.14
ben
0.14
uro
0.13
Jako
0.13
ãģŃ
0.13
ÙĨÙĩ
0.13
éĥ
0.13
é£
0.13
Activations Density 0.120%