INDEX
Explanations
texts written in a specific font style
instances of numbered bullet points or rankings
New Auto-Interp
Negative Logits
withdrawal
-0.67
attent
-0.66
expenditure
-0.62
consideration
-0.61
conversion
-0.61
opt
-0.60
elimination
-0.60
glare
-0.60
hement
-0.60
deprivation
-0.60
POSITIVE LOGITS
since
0.87
Anonymous
0.86
advertisement
0.85
THIS
0.82
Black
0.79
ãĥ´
0.78
eq
0.76
BU
0.76
yet
0.76
jer
0.76
Activations Density 0.217%