INDEX
Explanations
the similarity or sameness of words or concepts across different contexts
New Auto-Interp
Negative Logits
Provided
-0.85
skirts
-0.79
emi
-0.79
bane
-0.76
zy
-0.73
acus
-0.73
xtap
-0.73
uckle
-0.72
itely
-0.72
*=-
-0.72
POSITIVE LOGITS
thing
1.03
exact
1.02
amount
1.01
kind
0.91
kinds
0.87
basic
0.87
playbook
0.85
vein
0.83
principles
0.83
sort
0.82
Activations Density 0.042%