INDEX
Explanations
phrases indicating consistency and alignment with certain standards or expectations
consistent with
New Auto-Interp
Negative Logits
Katso
-0.47
Gefahr
-0.47
$_['
-0.47
marvin
-0.46
LoginPage
-0.46
waltung
-0.45
îna
-0.45
alamus
-0.44
WebServlet
-0.44
Geographie
-0.44
POSITIVE LOGITS
Consistent
0.95
consistent
0.95
Consistent
0.88
consistent
0.85
consistency
0.81
Consistency
0.80
Consist
0.79
Consistency
0.74
consistently
0.73
inconsistent
0.72
Activations Density 0.017%