INDEX
Explanations
phrases related to legal actions and consequences
punctuation marks and their frequencies in the text
New Auto-Interp
Negative Logits
conclud
-0.68
describ
-0.66
answ
-0.60
grounding
-0.60
baseline
-0.60
aggregation
-0.59
eatures
-0.59
prescriptions
-0.59
concess
-0.59
isolation
-0.58
POSITIVE LOGITS
nee
0.87
rama
0.82
etc
0.77
Kinnikuman
0.70
uthor
0.69
icio
0.68
supra
0.68
oshi
0.68
080
0.67
wait
0.67
Activations Density 0.200%