INDEX
Explanations
the verb "do" in various forms and contexts
New Auto-Interp
Negative Logits
aint
-0.15
hood
-0.15
mania
-0.15
wards
-0.15
ائج
-0.15
athan
-0.15
udio
-0.15
cy
-0.15
opoulos
-0.15
anta
-0.15
POSITIVE LOGITS
cket
0.22
justice
0.21
ctest
0.21
wrong
0.20
ctr
0.19
differently
0.18
ÅĤÄħ
0.18
oming
0.18
oooo
0.17
ctors
0.17
Activations Density 0.056%