INDEX
Explanations
other categories or alternatives
New Auto-Interp
Negative Logits
Typically
0.45
typically
0.42
Typically
0.42
സാധാരണ
0.40
typical
0.38
declaratory
0.38
typically
0.38
meist
0.37
generalmente
0.37
очередной
0.37
POSITIVE LOGITS
другие
0.62
வேறு
0.61
અન્ય
0.57
інші
0.57
diferentes
0.56
다른
0.55
більш
0.55
kleinere
0.54
ভিন্ন
0.53
farklı
0.53
Activations Density 0.052%