INDEX
Explanations
instances of conditional phrasing and auxiliary verbs
New Auto-Interp
Negative Logits
tha
-0.17
zin
-0.16
Flynn
-0.14
uced
-0.14
ÑĢд
-0.14
aping
-0.14
izoph
-0.13
atar
-0.13
inn
-0.13
aved
-0.13
POSITIVE LOGITS
egen
0.17
773
0.17
spot
0.16
olta
0.15
Finger
0.15
ritos
0.15
spots
0.14
chu
0.14
иÑģк
0.14
975
0.14
Activations Density 0.179%