INDEX
Explanations
phrases that indicate the implementation or commencement of rules, regulations, or conditions
New Auto-Interp
Negative Logits
ITS
-0.17
slim
-0.16
کت
-0.15
iers
-0.15
sno
-0.15
quest
-0.15
SETTING
-0.15
bare
-0.15
event
-0.14
siblings
-0.14
POSITIVE LOGITS
bomb
0.15
Prairie
0.15
Deferred
0.14
ynes
0.14
normalization
0.14
ibold
0.13
letic
0.13
normalized
0.13
far
0.13
ited
0.13
Activations Density 0.018%