INDEX
Explanations
references to filtering methods and their efficiency
New Auto-Interp
Negative Logits
wobec
-0.39
PasswordEncoder
-0.35
atchewan
-0.33
bijzonder
-0.32
fréquent
-0.32
marchandises
-0.31
kalangan
-0.31
specialchars
-0.31
ویکیپدی
-0.30
Angaben
-0.29
POSITIVE LOGITS
filming
0.98
filters
0.95
Plot
0.95
Fil
0.94
Plot
0.91
filmed
0.91
filtr
0.91
Fil
0.90
Filters
0.88
plot
0.85
Activations Density 0.301%