INDEX
Explanations
phrases indicating resistance to change or absence of significant impact
phrases that indicate a lack of change or impact
New Auto-Interp
Negative Logits
alian
-0.72
multiple
-0.63
abouts
-0.63
aran
-0.63
intertw
-0.62
maneu
-0.61
indo
-0.60
jan
-0.59
Courtesy
-0.58
ortment
-0.56
POSITIVE LOGITS
anymore
1.35
nor
1.34
whatsoever
1.16
anything
1.15
nor
1.04
anybody
1.01
slightest
1.00
any
0.97
either
0.95
anytime
0.89
Activations Density 0.304%