INDEX
Explanations
appearances of the word "out"
references to being "out" or a state of being outside
New Auto-Interp
Negative Logits
cius
-0.73
antry
-0.72
cious
-0.69
tyr
-0.67
arsen
-0.65
ingham
-0.61
avorite
-0.60
iosity
-0.59
etsk
-0.58
Pry
-0.57
POSITIVE LOGITS
fitted
1.07
stretched
1.01
lier
0.94
posts
0.90
doors
0.89
casts
0.86
smart
0.85
wards
0.84
)=(
0.83
flows
0.79
Activations Density 0.154%