INDEX
Explanations
instances of factual inaccuracies and evaluations related to reputation and credibility
New Auto-Interp
Negative Logits
mina
-0.16
alma
-0.16
oe
-0.15
åĸ
-0.14
hammer
-0.14
ë§ī
-0.14
/Button
-0.14
ecd
-0.14
localVar
-0.14
ære
-0.14
POSITIVE LOGITS
accuracy
0.21
accurate
0.17
accuracy
0.17
eck
0.17
Accuracy
0.16
Accuracy
0.16
EMON
0.16
asury
0.14
rel
0.14
ucken
0.14
Activations Density 0.192%