INDEX
Explanations
phrases related to cause and effect, specifically focusing on actions leading to a particular outcome
the word "the" and related phrases indicating significance or attention to specific subjects
New Auto-Interp
Negative Logits
>:
-0.92
leeve
-0.78
ttp
-0.77
/
-0.76
ossal
-0.76
Supported
-0.74
ãĤ´ãĥ³
-0.74
-->
-0.74
vised
-0.73
perse
-0.71
POSITIVE LOGITS
brakes
1.09
entire
1.08
lid
1.07
needle
1.06
curtain
1.00
envelope
0.97
tide
0.97
spotlight
0.95
blame
0.93
burden
0.93
Activations Density 0.241%