INDEX
Explanations
references to amounts or measurements
New Auto-Interp
Negative Logits
b
-0.19
bt
-0.16
bdb
-0.16
bare
-0.16
aiser
-0.16
bew
-0.15
hta
-0.15
boro
-0.15
bv
-0.15
in
-0.15
POSITIVE LOGITS
fang
0.25
gebung
0.24
ring
0.20
gang
0.20
mant
0.20
fas
0.19
fried
0.18
rank
0.17
geben
0.17
ittel
0.16
Activations Density 0.005%