INDEX
Explanations
expressions related to profanity and strong language
New Auto-Interp
Negative Logits
vider
-0.16
orer
-0.15
ity
-0.15
ÙĪØ±Ø´
-0.14
laden
-0.14
Rail
-0.14
setter
-0.13
icious
-0.13
Twin
-0.13
ảo
-0.13
POSITIVE LOGITS
.↵↵↵↵↵↵↵↵
0.15
?url
0.15
abbage
0.15
è͵
0.15
ardım
0.14
adge
0.14
ön
0.14
.↵↵↵↵↵↵↵↵↵↵
0.14
hl
0.13
bsd
0.13
Activations Density 0.012%