INDEX
Explanations
phrases indicating gratitude or appreciation
punctuation marks, particularly exclamation points and question marks
New Auto-Interp
Negative Logits
mun
-0.76
enf
-0.69
zik
-0.68
oun
-0.63
trap
-0.60
esome
-0.60
ortium
-0.60
quartered
-0.60
encies
-0.59
repro
-0.59
POSITIVE LOGITS
ãĥ¤
0.87
Visit
0.75
Carbuncle
0.72
Come
0.70
Wrestling
0.69
Please
0.68
Colleg
0.67
%%
0.67
Voting
0.65
Subtle
0.64
Activations Density 0.023%