INDEX
Explanations
names with the format of first name followed by last name
mentions of social media usernames or handles
New Auto-Interp
Negative Logits
instincts
-0.66
ãĥĥãĥī
-0.59
Ͻ
-0.59
solicitation
-0.56
ront
-0.55
outsider
-0.55
Ĥª
-0.55
inct
-0.54
inexper
-0.53
eries
-0.52
POSITIVE LOGITS
October
0.96
September
0.96
August
0.95
December
0.95
February
0.94
November
0.93
April
0.93
July
0.92
June
0.91
January
0.91
Activations Density 0.038%