INDEX
Explanations
phrases related to expressing opinions or attributing thoughts to others
statements or claims made by individuals regarding various topics
New Auto-Interp
Negative Logits
riage
-0.66
ricks
-0.63
geries
-0.63
Forward
-0.61
Engineers
-0.61
Quan
-0.60
arnaev
-0.59
endered
-0.59
Quentin
-0.58
Trigger
-0.58
POSITIVE LOGITS
ģĸ
0.76
©¶æ¥µ
0.72
sorely
0.70
dearly
0.68
itud
0.67
abouts
0.67
é¾į
0.67
·
0.66
deems
0.66
138
0.65
Activations Density 0.369%