INDEX
Explanations
percentages and specific terms
New Auto-Interp
Negative Logits
independência
0.72
confiança
0.69
ropa
0.69
semangat
0.68
rotnie
0.68
cerita
0.67
preferências
0.67
tö
0.66
тира
0.66
გილ
0.65
POSITIVE LOGITS
&
0.75
OS
0.67
/
0.64
OC
0.62
IA
0.62
(
0.61
IER
0.61
C
0.60
II
0.60
1
0.59
Activations Density 0.001%