INDEX
Explanations
instances of specific verbs indicating actions or states
New Auto-Interp
Negative Logits
nosis
-0.18
eyse
-0.18
ecko
-0.17
íĹĪ
-0.16
eph
-0.16
ï¼
-0.16
пе
-0.15
wy
-0.15
sis
-0.15
istrovstvÃŃ
-0.14
POSITIVE LOGITS
onto
0.16
581
0.15
ben
0.14
brick
0.14
ering
0.14
ÏĦαν
0.14
Om
0.14
Saint
0.14
deeper
0.14
upt
0.13
Activations Density 0.010%