INDEX
Explanations
phrases related to making arguments or claims
arguments and claims presented in a discussion
New Auto-Interp
Negative Logits
FORMATION
-0.67
PER
-0.66
Charges
-0.62
Brooks
-0.62
DER
-0.59
TPS
-0.59
KB
-0.57
offenses
-0.57
FORM
-0.57
rows
-0.57
POSITIVE LOGITS
uably
1.34
uments
1.08
emouth
1.08
roup
1.07
rave
1.07
entin
1.03
raph
1.02
allery
1.00
arin
0.99
regate
0.97
Activations Density 0.034%