INDEX
Explanations
phrases related to claims or assertions of identity or status
phrases where someone is making a claim
New Auto-Interp
Negative Logits
furt
-0.71
course
-0.70
bats
-0.65
specified
-0.65
noticed
-0.65
river
-0.63
apps
-0.60
Rapids
-0.58
cart
-0.58
items
-0.57
POSITIVE LOGITS
specialize
0.96
embody
0.89
represent
0.88
derive
0.86
be
0.85
recreate
0.85
perform
0.80
solve
0.80
speak
0.80
have
0.80
Activations Density 0.034%