INDEX
Explanations
negative sentiments and profanity
New Auto-Interp
Negative Logits
এরূপ
0.61
및
0.57
Ainsi
0.54
अवश्य
0.54
এইরূপ
0.54
pertanto
0.54
জনাব
0.52
אך
0.52
Indeed
0.50
하였다
0.50
POSITIVE LOGITS
fucking
0.91
fuck
0.88
fuck
0.88
fucked
0.84
Fuck
0.80
bullshit
0.79
shitty
0.78
piss
0.72
pissed
0.71
Fuck
0.69
Activations Density 0.002%