INDEX
Explanations
mentions of locations away from home and interactions with people from different places
occurrences of the special character 'Ļ'
New Auto-Interp
Negative Logits
disadvant
-1.09
mathemat
-0.78
contrace
-0.74
misunder
-0.74
incorpor
-0.73
lawy
-0.71
triv
-0.69
fortun
-0.67
advant
-0.66
princ
-0.66
POSITIVE LOGITS
ï¸ı
1.24
ï¸
0.97
âĢº
0.86
âĸ
0.83
à¥
0.82
âĸł
0.82
女
0.82
âĶĢâĶĢ
0.78
¯¯
0.78
âĹ
0.78
Activations Density 0.283%