INDEX
Explanations
specific examples or categories
New Auto-Interp
Negative Logits
Edwards
0.46
membentuk
0.39
proposte
0.38
kultur
0.37
läuft
0.37
Westminster
0.36
jeste
0.36
hänen
0.36
bliver
0.35
kimi
0.35
POSITIVE LOGITS
Individual
0.64
individual
0.61
individuale
0.57
индивидуа
0.52
individual
0.52
单个
0.52
конкре
0.48
Individual
0.48
Specific
0.47
TextBox
0.46
Activations Density 0.227%