INDEX
Explanations
phrases indicating personal states or experiences
New Auto-Interp
Negative Logits
activeClassName
-0.15
ÑĢаÑĤ
-0.15
riend
-0.14
Repeated
-0.14
IGHL
-0.14
engo
-0.14
ÏģÏİν
-0.13
VERBOSE
-0.13
ottage
-0.13
foy
-0.13
POSITIVE LOGITS
done
0.32
back
0.28
DONE
0.27
past
0.26
through
0.26
finished
0.24
settled
0.23
halfway
0.23
onto
0.23
ready
0.23
Activations Density 0.178%