INDEX
Explanations
the presence of the word "ha" and its variations related to emotional expressions or laughter
New Auto-Interp
Negative Logits
H
-0.89
HA
-0.79
Ha
-0.75
Hi
-0.69
Ho
-0.66
HO
-0.65
Han
-0.62
ฮ
-0.61
HJ
-0.60
HT
-0.59
POSITIVE LOGITS
her
1.45
hy
1.28
here
1.28
has
1.25
his
1.24
h
1.23
hon
1.23
hor
1.20
hen
1.20
har
1.18
Activations Density 0.241%