INDEX
Explanations
conditional statements indicating hypothetical scenarios
hypothetical scenarios or conditional statements
New Auto-Interp
Negative Logits
interstitial
-0.68
cringe
-0.65
Beware
-0.64
atile
-0.61
provoke
-0.61
aina
-0.60
oris
-0.58
Ãį
-0.58
enos
-0.57
bru
-0.57
POSITIVE LOGITS
hadn
1.01
weren
0.94
existed
0.88
Had
0.88
normalized
0.85
instead
0.83
were
0.82
stayed
0.79
hered
0.78
substituted
0.77
Activations Density 0.255%