INDEX
Explanations
phrases indicating significance or importance
New Auto-Interp
Negative Logits
woo
-0.13
anced
-0.12
possibilities
-0.12
à¸Ńำ
-0.12
_kind
-0.12
TypeInfo
-0.12
ìĬ¤íĬ¸
-0.12
orsch
-0.12
possibilit
-0.12
گاÙĩÛĮ
-0.12
POSITIVE LOGITS
single
0.28
lin
0.26
ke
0.26
chief
0.25
primary
0.25
predominant
0.24
overriding
0.24
sine
0.23
No
0.23
prime
0.23
Activations Density 0.124%