INDEX
Explanations
adjectives related to opinions and criticism
New Auto-Interp
Negative Logits
byter
-0.63
abad
-0.62
enegger
-0.62
entle
-0.61
foreseen
-0.59
ELL
-0.59
quer
-0.57
liam
-0.56
ells
-0.56
elta
-0.53
POSITIVE LOGITS
of
1.33
thereof
1.14
Of
1.04
Of
1.02
of
1.00
OF
0.94
enough
0.77
oft
0.76
ta
0.75
lest
0.75
Activations Density 0.179%