INDEX
Explanations
phrases related to connections and relationships between people or entities
phrases that contain slashes, indicating divisions or categories in text
New Auto-Interp
Negative Logits
swell
-0.80
square
-0.78
lifetime
-0.78
densely
-0.77
sincerely
-0.75
Chao
-0.74
younger
-0.74
mate
-0.73
leaflets
-0.73
breed
-0.73
POSITIVE LOGITS
whatever
1.67
etc
1.67
dist
1.46
coll
1.46
trans
1.45
tem
1.44
super
1.42
non
1.40
control
1.40
workshop
1.40
Activations Density 0.052%