INDEX
Explanations
phrases indicating consequences or cause-effect relationships
New Auto-Interp
Negative Logits
både
-0.19
addtogroup
-0.17
anden
-0.17
allerdings
-0.16
sondern
-0.15
however
-0.15
ocre
-0.15
jedoch
-0.15
však
-0.14
ượt
-0.14
POSITIVE LOGITS
/or
0.85
/OR
0.48
rogen
0.39
/of
0.35
rew
0.34
rog
0.34
наÑĩе
0.33
/o
0.32
hra
0.30
erson
0.28
Activations Density 2.005%