INDEX
Explanations
negative or contrasting statements
negations or phrases that express inadequacy or dissatisfaction
New Auto-Interp
Negative Logits
aughs
-0.81
Ĥİ
-0.74
etimes
-0.73
TIME
-0.72
States
-0.67
Companies
-0.65
WAY
-0.65
Treatment
-0.65
now
-0.64
nton
-0.64
POSITIVE LOGITS
flashy
1.21
overly
1.19
terribly
1.14
necessarily
1.12
icable
1.11
icably
1.06
epad
1.05
overpower
1.01
uncommon
0.95
fancy
0.94
Activations Density 0.171%