INDEX
Explanations
letters or symbols that indicate some sort of emphasis or special character in text
peculiar or non-standard characters in the text
New Auto-Interp
Negative Logits
Pony
-0.79
Crus
-0.74
Seym
-0.70
trainers
-0.70
*/(
-0.70
Vaugh
-0.68
therap
-0.66
ukong
-0.65
skelet
-0.65
emot
-0.65
POSITIVE LOGITS
ï¸ı
1.41
¯
0.95
nai
0.90
ña
0.90
âĢ¢âĢ¢âĢ¢âĢ¢
0.89
uthor
0.88
ï¸
0.88
ñ
0.88
£
0.86
¢
0.84
Activations Density 0.109%