INDEX
Explanations
restrictive or complex features
New Auto-Interp
Negative Logits
GATE
0.42
gateway
0.42
করিতেছি
0.41
Kitchen
0.41
transcribe
0.41
碳
0.41
Kai
0.40
B
0.39
↵
0.38
Fetch
0.38
POSITIVE LOGITS
overruling
0.52
backs
0.50
ierungs
0.47
enemy
0.46
aroused
0.46
បង្
0.46
overruled
0.44
angered
0.44
اخت
0.44
دیگر
0.44
Activations Density 0.001%