INDEX
Explanations
explicit sexual and vulgar language
New Auto-Interp
Negative Logits
Data
0.42
العناصر
0.42
Kör
0.41
conduc
0.40
Bedarf
0.40
ならでは
0.40
постепен
0.40
postdoctoral
0.40
پڑھی
0.39
Serrurier
0.39
POSITIVE LOGITS
fucking
1.23
fuck
1.20
asshole
1.15
shitty
1.13
fucked
1.11
merda
1.11
shit
1.09
Fuck
1.09
Fuck
1.07
fuck
1.06
Activations Density 0.038%