INDEX
Explanations
words related to cruelty and suffering
New Auto-Interp
Negative Logits
ird
-0.19
eri
-0.16
izu
-0.15
ute
-0.14
/live
-0.13
/use
-0.13
ifa
-0.13
ienes
-0.13
ichi
-0.13
ness
-0.13
POSITIVE LOGITS
agrid
0.14
adle
0.13
linear
0.13
itere
0.13
EO
0.13
unders
0.13
rott
0.13
EventData
0.13
Interop
0.13
Basket
0.13
Activations Density 0.035%