INDEX
Explanations
references to specific brands or promotional strategies
New Auto-Interp
Negative Logits
oret
-0.15
amina
-0.15
ARSE
-0.15
ืà¹Ī
-0.14
same
-0.14
corres
-0.14
asan
-0.13
breeze
-0.13
corresponding
-0.13
stereotype
-0.13
POSITIVE LOGITS
behalf
0.21
imbus
0.19
meaning
0.17
matter
0.17
ãģ«ãģ¤ãģĦãģ¦
0.17
upcoming
0.17
matters
0.16
topic
0.16
situation
0.16
matter
0.15
Activations Density 0.207%