INDEX
Explanations
verbs indicating intensity or strength
pronouns and their associated structures, indicating actions and descriptions
New Auto-Interp
Negative Logits
ammy
-0.69
Helpful
-0.67
Angola
-0.66
eworthy
-0.66
Distance
-0.64
Catalyst
-0.63
ugal
-0.63
Alright
-0.61
â̦â̦â̦â̦â̦â̦â̦â̦
-0.60
Dynamics
-0.59
POSITIVE LOGITS
practically
0.83
>]
0.80
barely
0.77
warrant
0.76
unrecogn
0.76
scarcely
0.75
deserve
0.75
hardly
0.74
unus
0.74
almost
0.73
Activations Density 0.094%