INDEX
Explanations
expressions of gratitude and thankfulness
New Auto-Interp
Negative Logits
ince
-0.19
VICES
-0.15
ίο
-0.14
leston
-0.14
ifa
-0.14
oir
-0.14
ihan
-0.14
oplan
-0.13
themselves
-0.13
apr
-0.13
POSITIVE LOGITS
very
0.34
again
0.32
much
0.24
very
0.24
so
0.23
once
0.23
again
0.23
kindly
0.23
VERY
0.23
ever
0.22
Activations Density 0.033%