INDEX
Explanations
phrases related to positive attributes or qualities like well-regarded, well-deserved, and well-resourced
words related to regulation and governance
New Auto-Interp
Negative Logits
knife
-0.68
mania
-0.63
agents
-0.62
pora
-0.61
Decoder
-0.60
magic
-0.60
Discussion
-0.60
wolves
-0.60
Jackets
-0.60
terms
-0.58
POSITIVE LOGITS
ented
1.09
ited
1.07
ivated
1.06
oured
1.02
ated
1.02
enged
1.02
arded
1.01
ested
1.00
ioned
1.00
ured
0.98
Activations Density 0.172%