INDEX
Explanations
text within brackets or parentheses
New Auto-Interp
Negative Logits
sheltered
0.50
activating
0.42
homogeneous
0.42
planning
0.40
focused
0.40
accessible
0.40
remained
0.40
engagement
0.39
,
0.39
occasional
0.39
POSITIVE LOGITS
Você
0.52
you
0.50
филосо
0.50
ல
0.50
ໃຊ
0.49
உங்களுக்கு
0.49
흄
0.47
ที่คุณ
0.46
elh
0.46
YOU
0.46
Activations Density 0.004%