INDEX
Explanations
phrases related to making arguments or stating opinions
phrases that present arguments or claims
New Auto-Interp
Negative Logits
pione
-0.86
ãĤ¼ãĤ¦ãĤ¹
-0.82
aukee
-0.75
obyl
-0.74
ractor
-0.72
inar
-0.72
apult
-0.72
Listener
-0.71
umat
-0.71
inosaur
-0.71
POSITIVE LOGITS
although
1.01
allowing
0.94
removing
0.92
despite
0.91
excessive
0.89
eliminating
0.88
adopting
0.87
restricting
0.85
insufficient
0.85
limiting
0.83
Activations Density 0.218%