INDEX
Explanations
indicates a consequence or result
New Auto-Interp
Negative Logits
8
0.70
7
0.65
અને
0.60
9
0.58
1
0.56
6
0.55
그래도
0.54
Анд
0.54
5
0.54
जोकि
0.49
POSITIVE LOGITS
också
0.51
auch
0.47
ook
0.46
myös
0.44
barkeit
0.43
ly
0.43
Mga
0.43
consequently
0.41
者的
0.41
tedy
0.40
Activations Density 0.022%