INDEX
Explanations
alternative actions or choices
the phrase "instead" indicating alternative actions or perspectives
New Auto-Interp
Negative Logits
vez
-0.67
neighbourhood
-0.66
aph
-0.65
foundations
-0.65
Que
-0.60
dies
-0.60
dirty
-0.60
derby
-0.59
ties
-0.59
foundation
-0.58
POSITIVE LOGITS
ctr
0.74
zbek
0.71
ortun
0.69
replace
0.69
chart
0.69
heses
0.68
opting
0.66
terness
0.64
ertodd
0.63
hesis
0.63
Activations Density 0.022%