INDEX
Explanations
phrases that indicate a deviation from the norm or regularity
phrases indicating normal or typical behaviors or situations
New Auto-Interp
Negative Logits
Kob
-0.68
Kut
-0.67
ged
-0.61
addons
-0.61
yet
-0.60
Bern
-0.59
Universe
-0.59
Kush
-0.59
Hung
-0.59
health
-0.58
POSITIVE LOGITS
disclaim
0.90
entimes
0.90
suspects
0.87
reserved
0.85
aspir
0.78
refers
0.76
regarded
0.75
speaking
0.74
relegated
0.73
associated
0.73
Activations Density 0.055%