INDEX
Explanations
specific nouns, such as proper nouns, objects, and body parts
keywords related to specific objects, actions, and attributes
New Auto-Interp
Negative Logits
with
-0.79
withd
-0.75
With
-0.61
onds
-0.61
ividual
-0.60
estate
-0.59
infeld
-0.58
owship
-0.57
With
-0.57
eworld
-0.57
POSITIVE LOGITS
intact
1.05
attached
0.99
enabled
0.93
thrown
0.90
looming
0.89
strapped
0.89
impunity
0.83
flourish
0.82
draped
0.82
inserted
0.80
Activations Density 0.517%