INDEX
Explanations
references to controversial subjects, particularly related to language, sex, and violence in media
New Auto-Interp
Negative Logits
cascade
-0.16
utin
-0.15
anko
-0.14
GC
-0.14
_gc
-0.14
Kun
-0.14
Cascade
-0.13
cascade
-0.13
mel
-0.13
afr
-0.13
POSITIVE LOGITS
ãĥ¼ãĥŀ
0.17
offensive
0.15
zon
0.15
Offensive
0.15
locker
0.15
spb
0.15
UnityEditor
0.15
CONTENT
0.14
content
0.14
ÑģÑİ
0.14
Activations Density 0.309%