INDEX
Explanations
phrases emphasizing assistance or support
New Auto-Interp
Negative Logits
****************
-0.39
odacty
-0.37
frigor
-0.32
MacGregor
-0.32
is
-0.31
Mon
-0.31
Muhammadu
-0.31
vå
-0.30
coroa
-0.30
_______________
-0.30
POSITIVE LOGITS
help
1.29
helps
1.25
Help
1.22
helped
1.22
helping
1.20
Help
1.20
Helps
1.19
help
1.18
Helping
1.14
Helps
1.14
Activations Density 0.051%