INDEX
Explanations
mentions of monetary values or prices
New Auto-Interp
Negative Logits
incl
-0.15
nte
-0.15
inem
-0.15
adlo
-0.14
ooth
-0.14
اتÙĩ
-0.13
/end
-0.13
ereotype
-0.13
emailer
-0.13
ointed
-0.13
POSITIVE LOGITS
ing
0.18
doom
0.15
ugu
0.15
anity
0.14
rain
0.14
iy
0.14
wiÄħ
0.14
hal
0.14
erman
0.14
ï¸ı
0.14
Activations Density 0.009%