INDEX
Explanations
numeric values within texts, such as years, percentages, or quantities
instances of the letter "A" at the beginning of sentences or phrases
New Auto-Interp
Negative Logits
Contents
-0.73
antry
-0.72
Finish
-0.69
Area
-0.68
agents
-0.67
pees
-0.66
acid
-0.65
anism
-0.63
Presents
-0.63
dishes
-0.62
POSITIVE LOGITS
cknowled
1.13
recent
1.12
curs
1.06
handful
1.01
usterity
1.01
decade
0.99
study
0.99
survey
0.99
glance
0.98
whopping
0.97
Activations Density 0.190%