INDEX
Explanations
Sudan, Hungary, Hong Kong, conflict
New Auto-Interp
Negative Logits
orschung
0.39
ulture
0.38
dari
0.37
она
0.37
kim
0.37
I
0.36
uruan
0.35
our
0.35
companionship
0.35
с
0.35
POSITIVE LOGITS
controvers
0.52
miatt
0.45
kvůli
0.44
universidades
0.42
akka
0.42
Zeitraum
0.41
الابتدائي
0.40
Einstellungen
0.40
exigir
0.40
できない
0.40
Activations Density 0.078%