INDEX
Explanations
adjectives describing qualities with varying levels of intensity
terms associated with legal or reputational implications
New Auto-Interp
Negative Logits
uers
-0.63
Parables
-0.61
©¶æ
-0.58
ivalry
-0.57
"},
-0.57
aturdays
-0.55
Pacific
-0.54
ools
-0.54
«ĺ
-0.54
Springer
-0.54
POSITIVE LOGITS
)
1.45
?)
1.33
-)
1.26
)-
1.24
!)
1.22
)'
1.21
)!
1.18
*)
1.13
)?
1.06
)—
1.05
Activations Density 0.302%