INDEX
Explanations
assertive language and expressions of opinion or argumentation
New Auto-Interp
Negative Logits
.
-0.60
UnusedPrivate
-0.50
dlatego
-0.49
fsp
-0.49
inoltre
-0.48
esperança
-0.47
:
-0.47
。
-0.46
elegans
-0.46
veck
-0.45
POSITIVE LOGITS
itſelf
0.93
technically
0.77
faſt
0.76
myſelf
0.74
ſmall
0.74
Reſ
0.73
fufficient
0.72
{},
0.71
fubject
0.70
poffible
0.70
Activations Density 0.331%