INDEX
Explanations
modifications or variations
instances of the word "modified" and related variations
New Auto-Interp
Negative Logits
çĦ
-0.83
True
-0.69
OPA
-0.69
Ranked
-0.68
mp
-0.67
velt
-0.66
doms
-0.66
vor
-0.65
FORE
-0.65
Rect
-0.64
POSITIVE LOGITS
atile
0.95
hap
0.83
iations
0.81
versions
0.77
wrench
0.77
atility
0.76
ively
0.76
organisms
0.75
itized
0.75
racks
0.73
Activations Density 0.020%