INDEX
Explanations
negative prefixes in adjectives and adverbs
New Auto-Interp
Negative Logits
964
-0.15
budd
-0.15
visibility
-0.14
_visibility
-0.14
Sang
-0.14
มาà¸ģ
-0.14
late
-0.14
293
-0.13
anonymity
-0.13
俺ãģ¯
-0.13
POSITIVE LOGITS
ecessarily
0.20
conv
0.20
character
0.18
orth
0.17
enth
0.17
Conv
0.16
ortho
0.15
Ïİνα
0.15
Conv
0.15
appet
0.15
Activations Density 0.025%