INDEX
Explanations
phrases related to making decisions or giving advice
occurrences of a specific character sequence or symbol
New Auto-Interp
Negative Logits
literacy
-0.71
pyramid
-0.69
lodging
-0.67
kicker
-0.67
recycling
-0.64
prostitutes
-0.64
opportunities
-0.63
Shelter
-0.62
strat
-0.62
cellphone
-0.62
POSITIVE LOGITS
ï¸ı
1.48
âĶĢâĶĢâĶĢâĶĢ
1.19
¯¯
1.18
£
1.05
âĻ
1.03
¯
1.00
¢
0.97
ï¸
0.96
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.96
âĶĢ
0.94
Activations Density 0.247%