INDEX
Explanations
internal followed by specific terms
New Auto-Interp
Negative Logits
外
0.54
Outside
0.54
の外
0.52
outside
0.51
Outside
0.49
户外
0.48
বাইরে
0.47
外
0.47
Außen
0.47
outside
0.46
POSITIVE LOGITS
internal
1.33
Internal
1.22
interne
1.20
internally
1.18
Internal
1.14
내부
1.10
INTERNAL
1.08
内部
1.02
internal
1.02
INTERNAL
0.95
Activations Density 0.009%