INDEX
Explanations
words related to legal charges or accusations
New Auto-Interp
Negative Logits
mor
-0.51
zen
-0.50
ter
-0.50
tires
-0.50
hair
-0.50
ny
-0.49
el
-0.49
cu
-0.48
cess
-0.48
é
-0.47
POSITIVE LOGITS
zijne
0.63
صوتيه
0.60
kautta
0.59
poffible
0.56
maravilloso
0.56
gustaba
0.56
mío
0.55
fufficient
0.54
maravillosa
0.54
regalías
0.54
Activations Density 0.149%