INDEX
Explanations
characters with a special symbol before and after their name
instances of the character "Ŀ" in the text
New Auto-Interp
Negative Logits
disadvant
-0.84
warr
-0.83
ende
-0.78
psychiat
-0.72
incorpor
-0.72
secretaries
-0.71
perspect
-0.70
answ
-0.70
unemploy
-0.69
misunder
-0.69
POSITIVE LOGITS
ï¸ı
1.02
°
0.93
é¾į
0.86
âĻ
0.84
ÃĽ
0.83
º
0.81
âĶĢâĶĢ
0.81
âĢº
0.77
âĻ¥
0.76
ï¸
0.74
Activations Density 0.123%