INDEX
Explanations
expressions of personal opinions
New Auto-Interp
Negative Logits
elle
-0.72
uel
-0.72
IJ
-0.71
Ãį
-0.71
Topics
-0.69
³
-0.68
eks
-0.68
Reviewed
-0.68
Discuss
-0.68
ãĤ¨
-0.67
POSITIVE LOGITS
ours
0.87
nam
0.76
nons
0.70
hers
0.69
yours
0.67
theirs
0.67
chard
0.66
outright
0.65
otherwise
0.65
externally
0.63
Activations Density 0.136%