INDEX
Explanations
phrases or terms indicating significant amounts or levels of impact
New Auto-Interp
Negative Logits
vzor
-0.52
utih
-0.50
-0.49
stereotypes
-0.48
must
-0.48
ộp
-0.48
ed
-0.48
o
-0.46
vir
-0.46
paka
-0.46
POSITIVE LOGITS
considerable
2.03
considerable
1.88
substantial
1.80
Substantial
1.66
substantial
1.63
Considerable
1.62
stantial
1.53
sizable
1.42
sizeable
1.41
appreciable
1.30
Activations Density 0.332%