INDEX
Explanations
phrases indicating potential outcomes or implications
phrases that indicate meaning and consequences
New Auto-Interp
Negative Logits
ĸļ
-0.75
Omn
-0.67
Jur
-0.67
Wend
-0.66
Britann
-0.64
raint
-0.63
Jal
-0.61
Sabb
-0.61
Defense
-0.58
Herz
-0.58
POSITIVE LOGITS
depending
0.78
ESE
0.72
gettable
0.72
depending
0.71
γ
0.70
pole
0.70
tricky
0.70
safely
0.69
easily
0.68
GOODMAN
0.68
Activations Density 0.293%