INDEX
Explanations
statements that convey limitations or disclaimers
New Auto-Interp
Negative Logits
there
-0.72
there
-0.59
دیگه
-0.57
dingen
-0.56
一堆
-0.56
they
-0.56
we
-0.56
いろんな
-0.54
好多
-0.54
endless
-0.53
POSITIVE LOGITS
Moreover
1.48
Moreover
1.46
Accordingly
1.44
Consequently
1.41
Nonetheless
1.39
moreover
1.38
Accordingly
1.38
Nonetheless
1.36
Nevertheless
1.36
Furthermore
1.36
Activations Density 1.001%