INDEX
Explanations
phrases that express contrast or exceptions
New Auto-Interp
Negative Logits
ç³»
-0.15
aben
-0.14
ahoo
-0.14
оло
-0.14
cov
-0.14
(er
-0.13
ovu
-0.13
spring
-0.13
rames
-0.13
uctor
-0.13
POSITIVE LOGITS
adil
0.19
arro
0.16
dG
0.16
avel
0.15
REFERENCES
0.14
cano
0.14
Leadership
0.14
EMA
0.14
Americ
0.14
èm
0.13
Activations Density 0.043%