INDEX
Explanations
references to locations or vague places
New Auto-Interp
Negative Logits
a
-0.73
n
-0.73
s
-0.71
m
-0.70
p
-0.70
b
-0.69
tis
-0.68
ps
-0.67
n
-0.64
f
-0.64
POSITIVE LOGITS
somewhere
2.64
anywhere
2.56
somewhere
2.54
anywhere
2.51
Somewhere
2.48
Anywhere
2.42
nowhere
2.37
Somewhere
2.26
someplace
2.24
Anywhere
2.21
Activations Density 0.038%