INDEX
Explanations
instances of gratitude and appreciation expressed in relation to others
New Auto-Interp
Negative Logits
aw
-0.19
ato
-0.15
ask
-0.14
inf
-0.14
category
-0.14
heimer
-0.14
.aw
-0.14
uala
-0.14
mean
-0.13
Hed
-0.13
POSITIVE LOGITS
istrovstvÃŃ
0.17
üzel
0.17
roma
0.15
GuidId
0.15
omanip
0.15
ÑĢазд
0.14
ivals
0.14
θή
0.14
rame
0.14
ÅĽ
0.14
Activations Density 0.006%