INDEX
Explanations
specific phrases indicating extreme or impactful situations
phrases expressing extreme consequences or situations
New Auto-Interp
Negative Logits
ramid
-0.76
usk
-0.73
sson
-0.73
zman
-0.71
ilver
-0.70
byss
-0.70
ãĤ©
-0.69
redes
-0.69
intosh
-0.68
rolet
-0.68
POSITIVE LOGITS
even
1.10
it
1.05
nobody
1.03
they
0.95
anyone
0.91
anybody
0.88
none
0.83
hardly
0.82
whenever
0.81
we
0.80
Activations Density 0.135%