INDEX
Explanations
phrases indicating disbelief or uncertainty
phrases related to the lack of knowledge or awareness
New Auto-Interp
Negative Logits
venge
-0.71
Desc
-0.66
coming
-0.65
Ampl
-0.64
bearer
-0.63
Enhanced
-0.63
Bout
-0.61
Assassin
-0.60
accelerating
-0.59
inav
-0.58
POSITIVE LOGITS
merely
1.12
Instead
1.08
Nevertheless
1.05
instead
1.01
Nonetheless
0.99
Instead
0.92
simply
0.89
only
0.87
anyway
0.83
inconsist
0.83
Activations Density 1.243%