INDEX
Explanations
conditional or contrasting phrases
New Auto-Interp
Negative Logits
aurus
-0.16
isz
-0.16
hip
-0.14
latin
-0.14
Ñĥмов
-0.14
Socorro
-0.14
æĽ
-0.13
endi
-0.13
erate
-0.13
iate
-0.13
POSITIVE LOGITS
Hod
0.17
uga
0.16
Pon
0.15
ALSO
0.15
ãĥ¼ãĥª
0.15
fen
0.14
acen
0.14
йн
0.14
also
0.14
дам
0.14
Activations Density 0.215%