INDEX
Explanations
words indicating agreement or affirmation
New Auto-Interp
Negative Logits
-0.81
(
-0.64
"
-0.58
“
-0.57
-
-0.54
-0.53
T
-0.52
can
-0.51
↵↵
-0.50
set
-0.49
POSITIVE LOGITS
itſelf
1.13
שוליים
1.09
myſelf
1.02
enumi
0.94
transfieras
0.90
Tembelea
0.89
تقاوى
0.87
ſelf
0.86
BagLayout
0.85
parsedMessage
0.85
Activations Density 0.543%