INDEX
Explanations
conditional phrases that indicate a hypothetical or suggestion-based context
New Auto-Interp
Negative Logits
Toll
-0.16
o
-0.16
egan
-0.15
vo
-0.14
peer
-0.14
atore
-0.14
uish
-0.14
bench
-0.14
fh
-0.14
todd
-0.13
POSITIVE LOGITS
eyin
0.17
unny
0.16
okus
0.15
Premi
0.15
ạm
0.14
лÑıÑħ
0.14
ismu
0.14
walls
0.14
une
0.14
876
0.14
Activations Density 0.113%