INDEX
Explanations
mentions of laws and regulations
the presence of the article "a."
New Auto-Interp
Negative Logits
Own
-0.81
Background
-0.73
Instruct
-0.69
affles
-0.68
Events
-0.68
Runs
-0.67
external
-0.67
roots
-0.67
End
-0.66
Contents
-0.66
POSITIVE LOGITS
cknowled
1.04
rouse
1.04
spokeswoman
0.96
hallmark
0.95
spokesperson
0.89
couple
0.89
spokesman
0.88
thinly
0.88
reminder
0.87
precursor
0.86
Activations Density 0.179%