INDEX
Explanations
entities or groups of people
terms related to groups of people or entities
New Auto-Interp
Negative Logits
tains
-0.88
Contains
-0.74
Causes
-0.71
Adds
-0.71
Says
-0.66
Adds
-0.66
Allows
-0.65
Needs
-0.64
needs
-0.63
Does
-0.62
POSITIVE LOGITS
were
1.60
weren
1.55
tended
1.51
stayed
1.37
wore
1.35
flowed
1.34
remained
1.33
became
1.33
retreated
1.32
were
1.31
Activations Density 0.470%