INDEX
Explanations
phrases related to emotional responses and interpersonal interactions
New Auto-Interp
Negative Logits
مشين
-0.62
S
-0.60
1
-0.59
f
-0.59
l
-0.59
2
-0.58
T
-0.58
K
-0.57
0
-0.57
D
-0.56
POSITIVE LOGITS
pleaſure
1.27
myſelf
1.24
Anſ
1.24
reaſon
1.21
ſeveral
1.20
purpoſe
1.19
himſelf
1.14
Houſe
1.14
ſelf
1.13
ſever
1.13
Activations Density 2.760%