INDEX
Explanations
phrases indicating consequences or hypothetical situations
New Auto-Interp
Negative Logits
rette
-0.15
unos
-0.14
Maj
-0.14
ker
-0.14
dden
-0.14
.scalablytyped
-0.13
maj
-0.13
_defaults
-0.13
itez
-0.13
excell
-0.13
POSITIVE LOGITS
would
0.40
Would
0.35
Would
0.35
would
0.34
wouldn
0.31
zou
0.26
Wouldn
0.25
serait
0.25
skulle
0.24
würde
0.23
Activations Density 0.231%