INDEX
Explanations
data loading and population
New Auto-Interp
Negative Logits
д
0.59
odio
0.58
дт
0.57
perspici
0.56
د
0.55
saddhim
0.55
trenut
0.54
णसी
0.53
менова
0.53
liga
0.52
POSITIVE LOGITS
י
0.74
aw
0.65
o
0.63
i
0.62
e
0.61
1
0.60
p
0.60
cuticle
0.59
ി
0.59
Inn
0.58
Activations Density 0.002%