INDEX
Explanations
proper names
expressions of emotions or personal opinions
New Auto-Interp
Negative Logits
iaries
-0.75
-0.75
outset
-0.73
Attempts
-0.73
onding
-0.72
earchers
-0.71
igent
-0.70
utterstock
-0.68
istrates
-0.68
permitting
-0.67
POSITIVE LOGITS
âĢ
0.99
Kanye
0.98
Elsa
0.98
âĢ
0.96
âĿ
0.96
he
0.94
Beyon
0.90
Blizz
0.90
Rih
0.90
Kyl
0.89
Activations Density 0.591%