INDEX
Explanations
instances of significant changes or effects in research results
New Auto-Interp
Negative Logits
ModelExpression
-0.60
дешь
-0.57
soort
-0.55
valdo
-0.55
🇴
-0.54
örté
-0.53
مُعرِّف
-0.53
esh
-0.53
показания
-0.53
איך
-0.52
POSITIVE LOGITS
significantly
3.54
significantly
3.07
substantially
2.85
considerably
2.74
Significantly
2.59
drastically
2.54
greatly
2.50
dramatically
2.40
markedly
2.39
appreciably
2.21
Activations Density 0.100%