INDEX
Explanations
words related to legal matters and procedures
phrases related to emotions and personal experiences
New Auto-Interp
Negative Logits
ca
-0.76
tons
-0.74
hens
-0.73
etsk
-0.73
fully
-0.73
ties
-0.73
ghai
-0.72
come
-0.71
kish
-0.71
tery
-0.71
POSITIVE LOGITS
OUN
1.69
AND
1.65
AMES
1.64
ALE
1.64
URR
1.63
ORT
1.62
ORD
1.62
ATH
1.62
ENT
1.62
IGHTS
1.62
Activations Density 0.173%