INDEX
Explanations
references to victims and acts of violence or abuse
sexual acts involving her
New Auto-Interp
Negative Logits
…
-0.34
too
-0.32
media
-0.30
laterales
-0.28
stora
-0.28
azules
-0.27
site
-0.26
↵↵
-0.25
hard
-0.25
higher
-0.25
POSITIVE LOGITS
ſein
0.79
ſind
0.78
queſta
0.74
pholes
0.74
ſei
0.73
باردا
0.71
témoig
0.71
ujednoznacz
0.71
stockbild
0.70
disambiguazione
0.70
Activations Density 0.041%