INDEX
Explanations
phrases related to exceeding or over-extending limits or boundaries
New Auto-Interp
Negative Logits
Brave
-0.71
iframe
-0.71
minster
-0.67
ioxide
-0.66
PLA
-0.64
behind
-0.63
ichael
-0.62
wick
-0.59
spot
-0.58
letters
-0.58
POSITIVE LOGITS
xual
0.93
ealous
0.84
adows
0.77
ukong
0.71
caution
0.71
whel
0.69
hype
0.67
whelming
0.66
agog
0.65
dden
0.64
Activations Density 0.123%