INDEX
Explanations
categories followed by lists
New Auto-Interp
Negative Logits
acquires
0.41
sl
0.40
NERS
0.38
kar
0.36
object
0.36
intends
0.36
vision
0.35
annoyance
0.35
tu
0.35
chl
0.35
POSITIVE LOGITS
pecific
0.70
galore
0.65
आहेत
0.63
mith
0.62
cape
0.62
ales
0.61
ystem
0.60
are
0.59
pecies
0.59
ubmit
0.59
Activations Density 0.065%