INDEX
Explanations
mentions of situations or conditions being described as extreme
references to extreme conditions or situations
New Auto-Interp
Negative Logits
ËĪ
-0.82
gin
-0.79
shaw
-0.79
nance
-0.79
tein
-0.77
morrow
-0.75
glas
-0.75
bats
-0.74
edin
-0.73
chel
-0.73
POSITIVE LOGITS
extremes
1.01
extreme
0.97
lengths
0.87
temperatures
0.78
rarity
0.78
Extreme
0.78
instances
0.77
punishment
0.74
Extreme
0.74
punishments
0.74
Activations Density 0.009%