INDEX
Explanations
words related to legal cases or people involved in legal disputes
mentions of specific medical cases or conditions
New Auto-Interp
Negative Logits
ergy
-0.75
ager
-0.71
alogue
-0.70
WATCHED
-0.70
NESS
-0.69
iple
-0.67
box
-0.64
ivity
-0.64
alore
-0.62
arians
-0.62
POSITIVE LOGITS
ppe
1.36
zza
1.34
BILITY
1.10
ppa
1.08
pered
0.99
ffe
0.97
zzo
0.95
BILITIES
0.92
ea
0.91
ÅĤ
0.91
Activations Density 0.093%