INDEX
Explanations
brand names
sections of text that are empty or contain specific formatting without meaningful content
New Auto-Interp
Negative Logits
henko
-0.67
lla
-0.65
cano
-0.64
lished
-0.62
coni
-0.61
IRO
-0.60
recated
-0.60
cx
-0.59
itals
-0.59
lli
-0.59
POSITIVE LOGITS
igans
1.27
ittle
1.11
igan
1.07
abies
0.99
enburg
0.98
ibrary
0.96
ounge
0.96
igible
0.94
isted
0.91
abor
0.90
Activations Density 0.083%