INDEX
Explanations
words connected to sorting and categorization based on specific criteria
phrases related to categorization or organization methods
New Auto-Interp
Negative Logits
hai
-0.77
sil
-0.76
hend
-0.75
Guard
-0.71
bj
-0.70
ibi
-0.70
kee
-0.68
bur
-0.68
assisted
-0.67
?????
-0.67
POSITIVE LOGITS
geography
1.14
severity
1.09
attractiveness
1.04
geographic
0.99
demographics
0.96
geographical
0.96
latitude
0.96
ethnicity
0.95
gender
0.94
nationality
0.92
Activations Density 0.257%