INDEX
Explanations
code comments and `public` keywords
New Auto-Interp
Negative Logits
this
0.89
this
0.75
זה
0.64
(
0.62
side
0.61
it
0.61
này
0.61
from
0.60
-
0.60
from
0.59
POSITIVE LOGITS
/***
0.69
personnaliser
0.61
///////////////
0.60
persecution
0.58
participó
0.57
worrisome
0.56
大战
0.56
/****
0.55
///////
0.55
军事
0.55
Activations Density 0.021%