INDEX
Explanations
phrases related to legal actions or decisions
New Auto-Interp
Negative Logits
united
-0.70
Feet
-0.66
oline
-0.65
uties
-0.65
yne
-0.65
responsible
-0.64
Ë
-0.63
ructose
-0.63
ocene
-0.62
oh
-0.61
POSITIVE LOGITS
havoc
1.06
longstanding
0.83
rumours
0.76
proceedings
0.74
any
0.72
everything
0.70
disbelief
0.70
morale
0.70
illusions
0.70
altogether
0.70
Activations Density 0.172%