INDEX
Explanations
words related to intensity or extremity
references to extreme conditions or events
New Auto-Interp
Negative Logits
athan
-0.85
din
-0.76
ppo
-0.69
ilage
-0.68
beans
-0.67
iak
-0.67
ilus
-0.66
wan
-0.66
know
-0.66
sburgh
-0.66
POSITIVE LOGITS
vetting
1.05
poverty
0.95
lengths
0.89
temperatures
0.89
amounts
0.84
cases
0.83
measures
0.83
rarity
0.82
caution
0.82
situations
0.81
Activations Density 0.048%