INDEX
Explanations
phrases emphasizing the prevalence and significance of certain topics or elements
New Auto-Interp
Negative Logits
aml
-0.15
ion
-0.14
ë°Ģ
-0.14
logs
-0.14
various
-0.14
vrier
-0.13
treff
-0.13
Bottom
-0.13
oller
-0.13
holm
-0.13
POSITIVE LOGITS
population
0.22
population
0.20
attention
0.20
blame
0.19
Population
0.18
credit
0.17
effort
0.17
Population
0.17
focus
0.16
work
0.16
Activations Density 0.093%