INDEX
Explanations
phrases indicating uncertainty or contrast in information
instances of uncertainty or conditional expressions
New Auto-Interp
Negative Logits
Skydragon
-0.80
hitting
-0.70
Travels
-0.68
ands
-0.68
uilding
-0.67
PLUS
-0.66
resa
-0.65
aturday
-0.63
ateurs
-0.63
earchers
-0.62
POSITIVE LOGITS
admittedly
1.13
occasional
0.91
technically
0.88
occasionally
0.85
somewhat
0.83
nonetheless
0.83
exceptions
0.82
varies
0.81
caveats
0.80
may
0.80
Activations Density 0.255%