INDEX
Explanations
phrases related to legal terminology and actions
New Auto-Interp
Negative Logits
guessed
-0.75
suspicions
-0.73
guesses
-0.72
explanations
-0.70
explanation
-0.70
assumptions
-0.69
guessing
-0.68
jectures
-0.67
suggestion
-0.66
judged
-0.66
POSITIVE LOGITS
Publication
0.73
twimg
0.70
publication
0.65
publishing
0.64
publisher
0.61
Publications
0.61
出版
0.60
Publication
0.59
publications
0.59
publish
0.58
Activations Density 1.701%