INDEX
Explanations
questions starting with "Is that" or "Is it"
questions and statements that inquire about the validity or nature of a situation
New Auto-Interp
Negative Logits
mares
-0.72
bies
-0.70
CTV
-0.67
aband
-0.67
phia
-0.66
banks
-0.65
roups
-0.63
tails
-0.62
ipeg
-0.61
ãģ®ç
-0.61
POSITIVE LOGITS
spoiler
0.75
really
0.71
kidding
0.69
Really
0.67
orio
0.63
Ready
0.61
sexist
0.61
presum
0.61
worth
0.60
ironic
0.60
Activations Density 0.092%