INDEX
Explanations
phrases where something is defined or categorized
the word "defined" in various contexts
New Auto-Interp
Negative Logits
osc
-0.74
eor
-0.72
atl
-0.72
heastern
-0.70
folio
-0.70
ersen
-0.69
rone
-0.69
rax
-0.69
orate
-0.67
rol
-0.67
POSITIVE LOGITS
follows
0.97
pired
0.84
opposed
0.81
pires
0.78
well
0.77
ocial
0.75
favoring
0.74
criptions
0.74
belonging
0.73
ylum
0.73
Activations Density 0.100%