INDEX
Explanations
numerical values and mathematical expressions
Profanity and swear words
strong negative emotion
New Auto-Interp
Negative Logits
]]
-0.73
niedersachsen
-0.71
umgekehrt
-0.69
intenant
-0.67
Wikiseite
-0.67
étoient
-0.67
}],
-0.66
seguida
-0.65
vieles
-0.64
giebt
-0.63
POSITIVE LOGITS
fucking
0.85
FUCKING
0.81
fucking
0.80
goddamn
0.78
でございます
0.78
fuck
0.77
fuck
0.76
ಠ
0.74
fuckin
0.73
doth
0.72
Activations Density 0.749%