INDEX
Explanations
connective words and phrases in the text
New Auto-Interp
Negative Logits
inh
-0.16
Tango
-0.15
reesome
-0.15
ê¸ī
-0.15
createClass
-0.15
ÑĪÑĮ
-0.14
atta
-0.14
azzi
-0.14
erval
-0.14
ANGLES
-0.14
POSITIVE LOGITS
effort
0.16
focus
0.15
exert
0.15
eff
0.14
flow
0.14
IDGET
0.14
jour
0.14
action
0.14
claimed
0.14
135
0.13
Activations Density 0.001%