INDEX
Explanations
assertive statements or positions regarding certain topics or situations
topics related to charges and claims in political or legal contexts
New Auto-Interp
Negative Logits
aughs
-0.71
ETHOD
-0.63
sucks
-0.61
asus
-0.61
orse
-0.59
cer
-0.59
descriptor
-0.58
iversary
-0.58
bye
-0.58
bilt
-0.57
POSITIVE LOGITS
ranging
1.05
ourcing
0.99
ynthesis
0.94
hops
0.93
hooting
0.90
cape
0.89
pertaining
0.89
afety
0.89
emanating
0.85
abound
0.85
Activations Density 0.338%