INDEX
Explanations
statistical data or percentages in a text
phrases indicating proportions or percentages
New Auto-Interp
Negative Logits
agate
-0.86
locality
-0.74
messenger
-0.70
illary
-0.68
Trend
-0.68
ieu
-0.67
upon
-0.65
intent
-0.65
ocratic
-0.64
Globe
-0.64
POSITIVE LOGITS
acebook
0.70
arser
0.70
agascar
0.68
ensibly
0.66
inguished
0.65
theless
0.65
eteen
0.64
irds
0.64
ecided
0.62
irteen
0.62
Activations Density 0.005%