INDEX
Explanations
concepts related to trust and abuse
New Auto-Interp
Negative Logits
OGND
-0.50
butt
-0.49
IVENESS
-0.47
celu
-0.47
kologi
-0.47
ValueStyle
-0.46
ulose
-0.46
getDoctrine
-0.45
ogaster
-0.44
Билгалдахарш
-0.44
POSITIVE LOGITS
pourtant
0.82
ellers
0.72
betrayal
0.64
autrefois
0.61
inoxidable
0.61
inoxid
0.61
betrayed
0.60
Pourtant
0.59
disappointing
0.57
trusted
0.56
Activations Density 0.240%