INDEX
Explanations
phrases or sentences starting with "Even if"
conditional phrases indicating hypothetical scenarios
New Auto-Interp
Negative Logits
Flavoring
-0.80
ibur
-0.80
pour
-0.70
opter
-0.70
Domin
-0.67
vantage
-0.66
press
-0.66
etts
-0.65
Republic
-0.65
idespread
-0.65
POSITIVE LOGITS
they
0.84
you
0.80
SOME
0.77
technically
0.77
it
0.72
fy
0.71
hypot
0.70
theoretically
0.69
unintentionally
0.66
THEY
0.66
Activations Density 0.030%