INDEX
Explanations
terms related to urging action or emphasizing importance
phrases that express strong sentiments or opinions about specific elements or experiences
New Auto-Interp
Negative Logits
exting
-0.83
ÃĥÃĤ
-0.80
senal
-0.75
subur
-0.73
oÄŁ
-0.72
Course
-0.70
Tu
-0.70
çīĪ
-0.70
FP
-0.69
bart
-0.69
POSITIVE LOGITS
redeem
0.77
lesson
0.76
tribute
0.74
formula
0.73
takeaway
0.71
distinguishing
0.69
prophecy
0.69
defining
0.69
irony
0.68
consistency
0.68
Activations Density 0.165%