INDEX
Explanations
the word "just" in various contexts
New Auto-Interp
Negative Logits
aga
-0.17
razier
-0.17
agan
-0.16
just
-0.16
UGHT
-0.15
just
-0.15
ught
-0.15
elia
-0.14
abet
-0.14
/umd
-0.14
POSITIVE LOGITS
ifi
0.24
ifications
0.24
ifying
0.23
ifiable
0.21
ifies
0.20
ly
0.20
IFI
0.19
iciary
0.19
ifica
0.19
esen
0.19
Activations Density 0.046%