INDEX
Explanations
terms related to criticism and condemnation
New Auto-Interp
Negative Logits
ги
-0.17
bore
-0.15
ald
-0.15
ales
-0.15
632
-0.14
684
-0.14
Madness
-0.14
uckle
-0.14
iero
-0.14
controlled
-0.14
POSITIVE LOGITS
acos
0.16
oise
0.16
Zaman
0.15
ÙĪÙĬÙĥ
0.15
XL
0.15
ãĥ¼ãĥģ
0.15
ilent
0.14
atory
0.14
hur
0.14
IPA
0.14
Activations Density 0.072%