INDEX
Explanations
conditional statements indicating a hypothetical scenario
New Auto-Interp
Negative Logits
Flavoring
-0.85
ibur
-0.77
vantage
-0.77
ÙĴ
-0.73
etts
-0.71
ée
-0.70
asus
-0.69
yna
-0.69
Republic
-0.68
oeuv
-0.68
POSITIVE LOGITS
technically
0.94
they
0.88
outnumbered
0.75
admittedly
0.73
it
0.73
THEY
0.72
SOME
0.71
theoretically
0.68
ostensibly
0.67
hindsight
0.67
Activations Density 0.093%