INDEX
Explanations
adjectives that express size and feelings
New Auto-Interp
Negative Logits
np
-0.66
aho
-0.65
ugal
-0.63
{"-0.63
Alright
-0.62
AAA
-0.62
Dynamics
-0.60
Afgh
-0.60
romeda
-0.59
Yard
-0.59
POSITIVE LOGITS
practically
0.83
even
0.81
barely
0.81
scarcely
0.79
hardly
0.78
ruciating
0.77
almost
0.74
unus
0.70
virtually
0.68
>]
0.66
Activations Density 0.268%