INDEX
Explanations
adjectives indicating judgment or evaluation
phrases or expressions indicating absence, negation, or futility
New Auto-Interp
Negative Logits
ahime
-0.84
[|
-0.73
majority
-0.61
lance
-0.60
sometime
-0.58
gart
-0.58
ghan
-0.57
asus
-0.57
olds
-0.57
AX
-0.56
POSITIVE LOGITS
except
1.03
whatsoever
1.01
except
0.98
resembling
0.83
besides
0.76
nor
0.73
imilar
0.72
Except
0.71
ificant
0.70
ensical
0.69
Activations Density 0.172%