INDEX
Explanations
expressions of personal experiences and opinions
New Auto-Interp
Negative Logits
lopen
-0.17
lx
-0.15
itself
-0.15
áp
-0.15
ONGL
-0.14
ilet
-0.14
bilt
-0.14
ει
-0.14
lem
-0.14
appen
-0.14
POSITIVE LOGITS
’ve
0.23
've
0.23
’m
0.22
'm
0.21
’ll
0.19
'll
0.19
’d
0.18
/us
0.17
'd
0.17
iferay
0.17
Activations Density 0.451%