INDEX
Explanations
first-person personal statements or reflections
New Auto-Interp
Negative Logits
ousand
-0.15
↵↵
-0.15
ONGL
-0.15
ly
-0.14
bilt
-0.14
lx
-0.14
gether
-0.14
âĢı
-0.13
etter
-0.13
line
-0.13
POSITIVE LOGITS
zelf
0.21
åĢij
0.16
SELF
0.16
zzo
0.16
sop
0.16
’ve
0.16
/us
0.16
öyle
0.15
’d
0.15
’ll
0.15
Activations Density 0.466%