INDEX
Explanations
words or phrases that have meanings in different languages
definitions and meanings of words or terms
New Auto-Interp
Negative Logits
YD
-0.87
ADS
-0.84
CSS
-0.84
Vaugh
-0.81
IRC
-0.81
Ns
-0.81
ETS
-0.81
HL
-0.80
JC
-0.78
AMS
-0.78
POSITIVE LOGITS
"'
0.94
"
0.91
"(
0.84
"[
0.82
liar
0.80
",
0.79
\"
0.78
servant
0.78
pleasure
0.77
foreigner
0.77
Activations Density 0.094%