INDEX
Explanations
instances of legal terms or concepts
New Auto-Interp
Negative Logits
-
-0.67
e
-0.63
-
-0.60
↵
-0.55
ed
-0.54
o
-0.53
your
-0.53
.
-0.51
..
-0.51
you
-0.50
POSITIVE LOGITS
]:
1.52
?
1.51
).
1.47
]));
1.45
']))
1.44
]]
1.43
".
1.41
]){
1.41
"])
1.39
]))
1.37
Activations Density 0.269%