INDEX
Explanations
mentions of authority figures and their interactions with others
New Auto-Interp
Negative Logits
Tikang
-0.85
:✨
-0.72
ویکیپدی
-0.63
исленность
-0.62
Hentet
-0.62
autorytatywna
-0.61
snippetHide
-0.60
-0.60
setVerticalGroup
-0.59
InputTagHelper
-0.59
POSITIVE LOGITS
pupils
0.39
pupil
0.32
character
0.31
Kot
0.29
موا
0.29
gebiete
0.28
k
0.28
Opfer
0.28
camar
0.28
comic
0.28
Activations Density 0.678%