INDEX
Explanations
statements or discussions that express hope or lean towards a particular opinion or viewpoint
New Auto-Interp
Negative Logits
ataka
-0.19
.raises
-0.17
adar
-0.15
Comparable
-0.14
ÑĦеÑĢ
-0.14
vida
-0.14
ells
-0.14
witter
-0.14
asta
-0.14
interp
-0.14
POSITIVE LOGITS
lean
0.63
leaning
0.59
leans
0.58
leaned
0.56
Lean
0.50
Lean
0.46
lean
0.45
til
0.45
leaning
0.43
åĢ
0.42
Activations Density 0.279%