INDEX
Explanations
phrases expressing causation or consequences
New Auto-Interp
Negative Logits
umum
-0.52
yle
-0.52
setContentType
-0.52
snippetHide
-0.51
room
-0.50
styleUrls
-0.49
ovací
-0.49
Dougall
-0.49
ilerini
-0.49
ordf
-0.48
POSITIVE LOGITS
caused
0.94
akibat
0.87
caused
0.86
infolge
0.86
karena
0.85
colpa
0.85
بسبب
0.85
Caused
0.84
ůli
0.83
ibatkan
0.81
Activations Density 0.419%