INDEX
Explanations
phrases indicating uncertainty or suggestions
conditional statements or suggestions
New Auto-Interp
Negative Logits
ombat
-0.89
lete
-0.87
ife
-0.85
cies
-0.83
iak
-0.82
iya
-0.78
ocaust
-0.78
ament
-0.78
atches
-0.76
emale
-0.76
POSITIVE LOGITS
someday
1.24
subconscious
0.80
SOME
0.75
even
0.74
sooner
0.69
tempted
0.68
unsurprisingly
0.66
accidentally
0.66
misunder
0.66
kidding
0.66
Activations Density 0.039%