INDEX
Explanations
phrases indicating quotes or attributed speech
New Auto-Interp
Negative Logits
©
-0.16
вий
-0.15
æŀ
-0.14
Treasure
-0.14
reno
-0.14
kses
-0.14
said
-0.13
_sched
-0.13
/--
-0.13
dda
-0.13
POSITIVE LOGITS
explains
0.28
says
0.21
explain
0.18
Says
0.18
explain
0.16
explaining
0.16
_UNIFORM
0.16
describes
0.15
notes
0.15
772
0.15
Activations Density 0.101%