INDEX
Explanations
phrases related to conversations and dialogue between unidentified individuals
references to unidentified or uncharacterized entities, particularly around gender
New Auto-Interp
Negative Logits
sab
-0.68
moons
-0.63
fort
-0.63
sid
-0.63
san
-0.62
manif
-0.61
farmers
-0.61
ho
-0.59
belt
-0.59
dru
-0.59
POSITIVE LOGITS
ABLE
1.54
ING
1.52
ITY
1.51
ERY
1.48
ISH
1.46
ED
1.46
INE
1.46
OUT
1.46
ARE
1.45
ELL
1.45
Activations Density 0.061%