INDEX
Explanations
phrases indicating importance or significance
phrases indicating significance or prominence
New Auto-Interp
Negative Logits
imilar
-0.78
izon
-0.75
olution
-0.72
aturated
-0.69
anners
-0.69
ikk
-0.68
bern
-0.68
Practices
-0.68
ividual
-0.68
ãĤ´
-0.67
POSITIVE LOGITS
importantly
1.29
controvers
1.03
interestingly
1.02
cru
1.00
challeng
0.96
surprisingly
0.95
incidentally
0.95
fortunately
0.94
tragically
0.93
omin
0.93
Activations Density 0.122%