INDEX
Explanations
phrases related to observations or experiences
instances of reported observations or anecdotes
New Auto-Interp
Negative Logits
hest
-0.81
urden
-0.79
aler
-0.76
aven
-0.76
breaker
-0.74
iest
-0.73
undo
-0.72
phe
-0.70
eta
-0.70
worm
-0.70
POSITIVE LOGITS
instances
1.36
cases
1.23
examples
1.14
incidents
1.08
suicides
1.07
individuals
1.06
people
1.06
anecdotal
1.04
complaints
1.02
egregious
1.01
Activations Density 0.377%