INDEX
Explanations
phrases reflecting nuanced opinions or contrasts regarding experiences and perceptions
New Auto-Interp
Negative Logits
sice
-0.83
zwar
-0.78
therefore
-0.77
therefore
-0.71
thus
-0.70
infatti
-0.70
λοι
-0.67
ailleurs
-0.63
thus
-0.61
เลย
-0.58
POSITIVE LOGITS
also
1.13
nonetheless
1.07
ändå
1.05
nevertheless
1.02
digress
0.98
一方で
0.97
samtidigt
0.96
却是
0.96
também
0.96
también
0.95
Activations Density 0.818%