INDEX
Explanations
words related to the concept of "sweater."
references to specific items related to "Swe" or "Sweets."
New Auto-Interp
Negative Logits
retract
-0.86
ctors
-0.74
bluff
-0.74
aquarium
-0.73
limestone
-0.73
orc
-0.68
orig
-0.66
ARCH
-0.66
cies
-0.65
Pitt
-0.65
POSITIVE LOGITS
Swe
3.66
Swe
3.13
swe
2.92
swe
1.68
Sne
1.24
Sweep
1.17
Hoo
1.15
sweets
1.01
Sung
0.98
Sle
0.93
Activations Density 0.037%