INDEX
Explanations
adjectives followed by 'all'
New Auto-Interp
Negative Logits
pu
-0.66
lf
-0.61
uay
-0.61
virginity
-0.58
avorite
-0.57
proverb
-0.57
cel
-0.57
Ferdinand
-0.57
geries
-0.56
algia
-0.56
POSITIVE LOGITS
expense
0.88
levels
0.88
angles
0.84
ocating
0.83
times
0.82
å¸
0.78
onge
0.76
seams
0.74
wavelengths
0.74
hazards
0.74
Activations Density 0.024%