INDEX
Explanations
phrases that refer to interactions within political or religious communities
New Auto-Interp
Negative Logits
awns
-0.15
ataires
-0.14
ableView
-0.14
à¹īà¸Ńย
-0.14
ekim
-0.14
è³¢
-0.14
icrous
-0.13
.reducer
-0.13
ãĥ³ãĥĶ
-0.13
unkt
-0.13
POSITIVE LOGITS
bounds
0.48
confines
0.43
boundaries
0.43
framework
0.42
limits
0.39
walls
0.38
framework
0.35
context
0.34
frameworks
0.32
bounds
0.32
Activations Density 0.056%