INDEX
Explanations
references to historical or political figures and their associations
New Auto-Interp
Negative Logits
buie
-0.61
Majest
-0.61
uxxxx
-0.59
Phry
-0.58
cheinend
-0.56
physis
-0.56
TextEditing
-0.55
AttributeError
-0.55
childs
-0.54
ḏ
-0.53
POSITIVE LOGITS
other
0.73
autres
0.71
posteriores
0.62
other
0.61
Dun
0.61
autres
0.59
altre
0.59
otras
0.58
idemia
0.56
+:+
0.56
Activations Density 0.590%