INDEX
Explanations
titles or headings
specific high-frequency characters or symbols, particularly the character 'Ŀ'
New Auto-Interp
Negative Logits
disadvant
-0.85
psychiat
-0.70
condem
-0.70
contrace
-0.69
ponder
-0.69
behavi
-0.69
likeness
-0.68
unemploy
-0.68
obser
-0.67
floppy
-0.67
POSITIVE LOGITS
ï¸ı
0.96
°
0.92
¯
0.86
º
0.86
ï¸
0.85
é¾į
0.81
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.80
âĪ
0.79
âĻ¥
0.79
log
0.79
Activations Density 0.164%