INDEX
Explanations
positive or negative evaluations and attitudes towards different topics or circumstances
New Auto-Interp
Negative Logits
kindred
-0.55
unta
-0.52
adr
-0.52
him
-0.49
bub
-0.47
enary
-0.47
lé
-0.46
aternity
-0.46
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.45
fold
-0.45
POSITIVE LOGITS
Thoughts
0.68
Person
0.56
though
0.55
however
0.53
Length
0.53
Rating
0.52
itionally
0.52
é¾įåĸļ士
0.52
Overall
0.51
alyst
0.50
Activations Density 12.161%