INDEX
Explanations
the word "Out" occurring in the text
references to a specific media outlet or program
New Auto-Interp
Negative Logits
arsen
-0.78
compr
-0.77
avorite
-0.73
iosity
-0.68
EStream
-0.66
trem
-0.66
=-=-=-=-=-=-=-=-
-0.65
ãĥŁ
-0.64
tyr
-0.64
assum
-0.63
POSITIVE LOGITS
doors
1.14
stretched
1.07
landish
1.07
raged
1.06
rage
1.04
fitted
1.03
lander
1.02
skirts
0.99
wards
0.98
numbered
0.98
Activations Density 0.030%