INDEX
Explanations
words related to continuity or remaining unchanged
instances of the word "remain" indicating ongoing states or conditions
New Auto-Interp
Negative Logits
ramid
-0.86
ongyang
-0.70
Tribune
-0.67
ranch
-0.66
ilar
-0.65
ounter
-0.64
insula
-0.63
Circuit
-0.62
isson
-0.61
ilateral
-0.61
POSITIVE LOGITS
unchanged
1.10
unanswered
0.97
undecided
0.93
intact
0.92
afloat
0.88
unaffected
0.83
unsolved
0.83
rences
0.79
untouched
0.79
nces
0.78
Activations Density 0.023%