INDEX
Explanations
negative sentiment expressions
New Auto-Interp
Negative Logits
myſelf
-0.89
pleaſure
-0.86
חיצוניים
-0.82
purpoſe
-0.81
―――――
-0.79
ſy
-0.77
ⓧ
-0.77
diſt
-0.75
himſelf
-0.74
raiſ
-0.74
POSITIVE LOGITS
0.55
enumi
0.55
stdc
0.47
tomat
0.45
_
0.45
__':
0.43
Utf
0.41
֔
0.41
$
0.41
elett
0.41
Activations Density 0.070%