INDEX
Explanations
sentences summarizing conclusions or key points
New Auto-Interp
Negative Logits
ãĥĺ
-0.84
roups
-0.74
ipel
-0.72
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.70
DOM
-0.68
TIT
-0.68
chlor
-0.66
legged
-0.66
leted
-0.64
itialized
-0.63
POSITIVE LOGITS
takeaway
0.98
boils
0.86
message
0.76
verdict
0.75
nings
0.73
perspective
0.72
:
0.72
approach
0.69
nutshell
0.69
lesson
0.68
Activations Density 0.056%