INDEX
Explanations
words related to cultural elements and emotions
New Auto-Interp
Negative Logits
-1.09
,
-0.95
(
-0.85
-
-0.83
.
-0.82
:
-0.80
in
-0.80
-
-0.79
↵
-0.77
and
-0.75
POSITIVE LOGITS
Efq
1.82
myſelf
1.67
་་
1.66
―――――
1.61
Anſ
1.60
ſind
1.60
itſelf
1.54
ſelf
1.50
Houſe
1.49
Theſe
1.49
Activations Density 0.011%