INDEX
Explanations
instances where a situation is being described as different from what is expected
the word "Instead" and its various forms as a pivot in discussions or arguments
New Auto-Interp
Negative Logits
AZ
-0.67
Condition
-0.61
rament
-0.59
neighbourhood
-0.59
ENTS
-0.58
ental
-0.57
SF
-0.57
ties
-0.56
gin
-0.55
emate
-0.55
POSITIVE LOGITS
opting
0.78
ples
0.75
terness
0.75
ertodd
0.70
ortun
0.69
chart
0.66
ilon
0.65
zbek
0.65
¬¼
0.65
preferring
0.62
Activations Density 0.024%