INDEX
Explanations
keywords regarding disclaimers, instructions, or specifications
phrases indicating purpose or intent
New Auto-Interp
Negative Logits
externalActionCode
-0.66
VIDEOS
-0.65
Pledge
-0.64
reacts
-0.63
falls
-0.62
wounds
-0.62
é¾įå¥ij士
-0.61
enjoys
-0.61
»Ĵ
-0.61
defenses
-0.60
POSITIVE LOGITS
cluded
0.81
ivated
0.81
bidden
0.81
avering
0.78
aneous
0.77
ivable
0.76
ivating
0.76
lain
0.74
clusive
0.74
cd
0.73
Activations Density 0.155%