INDEX
Explanations
superlatives and generalizations that emphasize a predominant characteristic
phrases that indicate general trends or majority opinions
New Auto-Interp
Negative Logits
heid
-0.90
isa
-0.72
redo
-0.71
acus
-0.69
oak
-0.66
iton
-0.64
atar
-0.60
tis
-0.60
agra
-0.59
arter
-0.58
POSITIVE LOGITS
mileage
0.69
acci
0.64
uninterrupted
0.63
igent
0.62
obvious
0.61
nesota
0.61
recent
0.61
comprehensive
0.60
¬¼
0.59
cients
0.58
Activations Density 0.055%