INDEX
Explanations
phrases indicating transformation or change in states
New Auto-Interp
Negative Logits
arton
-0.16
allis
-0.15
dent
-0.15
olute
-0.14
sis
-0.14
unga
-0.14
_BC
-0.14
çĹĩ
-0.14
;element
-0.14
pong
-0.14
POSITIVE LOGITS
aily
0.15
Ĥ¨
0.15
Scal
0.15
von
0.14
ible
0.14
Shen
0.14
timber
0.14
hed
0.13
ub
0.13
FormField
0.13
Activations Density 0.327%