INDEX
Explanations
phrases that indicate direction or movement towards a specific goal or concept
New Auto-Interp
Negative Logits
íģ¼
-0.15
/w
-0.15
orca
-0.14
/her
-0.14
culate
-0.14
aac
-0.14
oretical
-0.14
kır
-0.13
<<-
-0.13
ctime
-0.13
POSITIVE LOGITS
/from
0.23
/about
0.22
gether
0.19
wards
0.19
whom
0.18
GGLE
0.18
towards
0.18
ness
0.17
sWith
0.17
Towards
0.17
Activations Density 0.024%