INDEX
Explanations
instances of the word "ha," indicating laughter or amusement
New Auto-Interp
Negative Logits
SpringRunner
-0.53
tidumbre
-0.51
wpi
-0.50
-0.49
Dunkel
-0.49
mtr
-0.48
oner
-0.47
שלו
-0.46
gerald
-0.46
pozorn
-0.46
POSITIVE LOGITS
ha
2.31
HA
1.49
Ha
1.44
ha
1.43
Ha
1.30
HA
1.03
ха
0.91
ハ
0.88
हा
0.87
haa
0.82
Activations Density 0.008%