INDEX
Explanations
various prefixes and suffixes in words
instances of profanity and derogatory terms
New Auto-Interp
Negative Logits
hyde
-0.93
Annotations
-0.79
Macedonia
-0.77
Rica
-0.76
Puzzles
-0.74
Ĥİ
-0.73
EStream
-0.71
FactoryReloaded
-0.70
Fargo
-0.68
å§«
-0.68
POSITIVE LOGITS
iest
1.05
est
1.04
erb
0.89
rep
0.89
ounding
0.88
usive
0.88
uper
0.87
eful
0.87
ib
0.86
ashed
0.85
Activations Density 0.314%