INDEX
Explanations
animals, family, connection
New Auto-Interp
Negative Logits
1.00
est
0.91
em
0.87
iri
0.86
sehr
0.85
ute
0.85
ungs
0.82
0.80
irth
0.80
'
0.80
POSITIVE LOGITS
<unused547>
1.50
т
1.46
<unused1056>
1.46
<unused1994>
1.44
<unused960>
1.43
<unused99>
1.42
<unused1218>
1.41
<unused2081>
1.39
<unused302>
1.39
<unused297>
1.39
Activations Density 0.001%