INDEX
Explanations
themes related to truth, reality, and personal authenticity
New Auto-Interp
Negative Logits
InSection
-0.56
Reuter
-0.55
pria
-0.50
industriels
-0.49
adaptés
-0.48
elé
-0.47
出版年
-0.47
entradas
-0.47
ographique
-0.47
록
-0.47
POSITIVE LOGITS
reveals
0.79
Exposed
0.78
exposed
0.77
revealing
0.74
revealed
0.74
Exposed
0.73
fragility
0.73
reveal
0.70
exposes
0.67
exposed
0.66
Activations Density 0.183%