INDEX
Explanations
conditional statements or hypothetical scenarios using the word "if"
New Auto-Interp
Negative Logits
pour
-0.86
ahime
-0.81
oult
-0.81
olis
-0.76
ggles
-0.75
ossom
-0.73
berus
-0.73
uct
-0.73
iband
-0.69
ricks
-0.69
POSITIVE LOGITS
they
0.92
unwittingly
0.79
unintentionally
0.78
outnumbered
0.77
THEY
0.77
it
0.77
warranted
0.76
soever
0.75
inadvertently
0.74
technically
0.74
Activations Density 0.081%