INDEX
Explanations
assertions and affirmations of knowledge or capability
New Auto-Interp
Negative Logits
VERY
-0.14
á»ĩu
-0.14
cÃłng
-0.14
informal
-0.13
umper
-0.13
ieux
-0.13
gore
-0.13
cue
-0.13
ington
-0.13
VERY
-0.13
POSITIVE LOGITS
actual
0.32
actually
0.29
actual
0.27
羣æŃ£
0.26
Actual
0.24
unlike
0.24
Actual
0.22
instead
0.21
(actual
0.21
actually
0.21
Activations Density 0.445%