INDEX
Explanations
phrases related to familiar topics or concepts, especially in the context of equivalences and comparisons
phrases that indicate harm or danger to individuals
New Auto-Interp
Negative Logits
tarians
-0.40
cigarettes
-0.40
estern
-0.36
DragonMagazine
-0.36
unctions
-0.36
Canaver
-0.35
¶
-0.34
itars
-0.34
cedented
-0.34
Econom
-0.33
POSITIVE LOGITS
overhead
0.34
interchange
0.33
nearby
0.32
.""
0.31
imitation
0.31
respectively
0.31
}.
0.30
glor
0.30
unspecified
0.30
electro
0.30
Activations Density 7.027%