INDEX
Explanations
references to billboards and disclaimers in text
references to billboards or advertising displays
New Auto-Interp
Negative Logits
ever
-0.76
arnaev
-0.74
de
-0.73
ivas
-0.72
othes
-0.72
umar
-0.67
ube
-0.67
rity
-0.66
iencies
-0.65
ERAL
-0.65
POSITIVE LOGITS
billboards
1.37
billboard
1.35
advertising
0.92
ModLoader
0.80
posters
0.77
slogan
0.77
advertisements
0.76
salesman
0.75
gimmick
0.74
signage
0.73
Activations Density 0.008%