INDEX
Negative Logits
list
-0.08
deng
-0.07
manual
-0.07
Grammarly
-0.07
Nazis
-0.07
pir
-0.07
dram
-0.07
difference
-0.07
poster
-0.07
grep
-0.07
POSITIVE LOGITS
Applicable
0.09
accompanied
0.08
substitutions
0.08
applied
0.08
Applic
0.08
disclaimer
0.08
applicable
0.07
Applic
0.07
accompanying
0.07
Applied
0.07
Activations Density 0.037%