INDEX
Explanations
phrases that indicate statistical references or generalizations about populations
New Auto-Interp
Negative Logits
inine
-0.18
ero
-0.16
اکÛĮ
-0.16
umen
-0.15
yle
-0.14
elines
-0.14
hen
-0.14
pes
-0.14
Ïĥαν
-0.14
ORS
-0.14
POSITIVE LOGITS
note
0.33
note
0.27
Note
0.23
course
0.23
Note
0.20
NOTE
0.19
osu
0.19
notice
0.19
those
0.19
sted
0.19
Activations Density 0.016%