INDEX
Explanations
words related to physical thinness or slimness
occurrences of the word "thin" and its variations
New Auto-Interp
Negative Logits
bucks
-0.77
ktop
-0.71
another
-0.65
perty
-0.61
dam
-0.60
Admir
-0.59
ontent
-0.59
0100
-0.59
dad
-0.59
HI
-0.58
POSITIVE LOGITS
ned
1.57
ning
1.48
ners
1.26
ening
1.07
ened
0.96
layer
0.95
nery
0.94
nesses
0.92
slices
0.89
ness
0.87
Activations Density 0.059%