INDEX
Explanations
phrases related to numerical quantities and durations
New Auto-Interp
Negative Logits
ëł¥
-0.15
623
-0.15
é¦Ĩ
-0.14
prohib
-0.14
óc
-0.14
Siz
-0.14
atta
-0.14
Catal
-0.14
аÑĢÑĩ
-0.14
943
-0.14
POSITIVE LOGITS
Sür
0.16
Ùħز
0.16
áºł
0.15
eldorf
0.15
uchos
0.14
¹Ħ
0.14
ยà¸ĩ
0.14
okens
0.14
ụy
0.14
yang
0.14
Activations Density 0.112%